System and method for disseminating topology and link-state information to routing nodes in a mobile ad hoc network

ABSTRACT

Described is a link-state routing protocol used in a mobile ad hoc network or in an Internet for disseminating topology and link-state information throughout the network. Reverse-path forwarding is used to broadcast each update along the minimum-hop-path tree rooted at the source of the update. Each path tree has the source node as a root node, a parent node, and zero or more children nodes. Updates are received from the parent node in the path tree for the source node that originates the update. Each update includes information related to a link in the network. A determination is made whether to forward the update message to children nodes, if any, in the path tree maintained for the source node originating the update in response to information in the received update. This information itself can indicate whether the update is to be forwarded to other nodes.

RELATED APPLICATIONS

[0001] This application claims the benefit of the filing date ofco-pending U.S. Provisional Application, Serial No. 60/232,047, filedSep. 12, 2000, entitled “Techniques for Improved Topology BroadcastBased on Reverse-Path Forwarding” and co-pending U.S. ProvisionalApplication, Seroal No. ______, filed Nov. 14, 2000, entitled “EfficientRouting Protocols for Packet-Radio Networks Based on Tree Sharing”, theentirety of which provisional applications is incorporated by referenceherein.

BACKGROUND

[0002] A network is a collection of communications entities (e.g.,hosts, routers, and gateways) that are in communication with each otherover communication links. Organizing communications entities intonetworks increases the capabilities of the communication entities beyondthat which each communications entity alone is capable by enabling suchentities to share resources. A network that interconnects communicationsentities within a common geographical area (for example, the personalcomputers in an office) is called a local area network (LAN). Some LANsemploy one or more network servers that direct the flow of data withinthe network and control access to certain network functions such asstoring data in a central file repository, printing, accessing othernetworks. In other LANs, computers communicate with each other withoutthe use of servers.

[0003] A wide area network (WAN), of which the Internet is an example,is a collection of geographically distributed LANs joined by long-rangecommunication links. The Internet is a publicly accessible, worldwidenetwork of networks based upon a transmission protocol known as TCP/IP(Transmission Control Protocol/Internet Protocol). Communications on theInternet is packet-switched; that is, the information that is to passfrom one communications entity to another is broken into packets thatare individually passed from router to router until the packets arriveat their destination. The TCP divides the data into segments andprovides reliable delivery of bytes in the segments to the destination,which reconstructs the data. The IP further subdivides the TCP segmentsinto packets and routes the packets to their final destination. Theroute taken by packets may pass through one or more networks, dependingupon the Internet Protocol (IP) address of the destination.

[0004] A rapidly growing part of the Internet is the World Wide Web(“Web”), which operates according to a client-server model. Clientsoftware, commonly referred to as a Web browser, runs on a computersystem. After establishing an Internet connection, the client userlaunches the Web browser to communicate with a Web server on theInternet. Using TCP/IP, the Web browser sends HTTP (Hypertext TransportProtocol) requests to the Web server. The request traverses theInternet's TCP/IP infrastructure to Web host server as HTTP packets.

[0005] A private network based on Internet technology and consisting ofa collection of LAN and WAN components is called an Intranet.Accordingly, communications entities that are part of an intranet canuse a Web browser to access Web servers that are within the intranet oron the Internet.

[0006] Today, most of the communication links between the variouscommunications entities in a networks are wire-line; that is, clientsystems are typically connected to a server and to other client systemsby wires, such as twisted-pair wires, coaxial cables, fiber opticcables, and the like. Wireless communication links, such as microwavelinks, radio frequency (RF) links, infrared (IR) links, and satellitelinks, are becoming more prevalent in networks.

[0007] A characteristic of wireless networks is that the communicationentities in the network are mobile. Such mobility creates frequent,dynamic changes to the network topology and state of the communicationlinks between the communication entities. Mobility is less of a concernfor those communication entities connected to the Internet by wire-line,however, the topology of the Internet is perpetually changing, withcommunication entities joining and leaving the Internet often. Also, thestate of communication links between communication entities on theInternet may change for various reasons, such as increased packettraffic.

[0008] To effectively route messages through such dynamically changingnetworks, routers need to remain informed of topology and link-statechanges. Existing methods based on flooding are inefficient and consumetoo much network bandwidth. The inefficiency of flooding is the result,in part, of the following redundancies: (1) link-state and topologyupdates are sent over multiple paths to each router; and (2) everyrouter forwards every update to all neighboring routers, even if only asmall subset of the neighboring routers need to receive it.

[0009] Thus, there remains a need for a mobile wireless network that canperform reliably and efficiently despite the aforementioned difficultiesassociated with the mobility of the communication entities in thenetwork.

SUMMARY OF THE INVENTION

[0010] In one aspect, the invention features a method for disseminatingtopology and link-state information over a multi-hop network comprisedof nodes. A path tree is maintained for each source node in the networkthat can produce an update message. The path tree associated with eachnode can be a minimum-hop-path tree. Link-state information obtainedfrom one or more nodes in the path tree maintained for a given sourcenode is used in developing the path tree to that source node. Each pathtree has that source node as a root node, a parent node, and zero ormore children nodes. An update message is received from the parent nodein the path tree maintained for the source node that originated theupdate message. The update message includes information related to alink in the network. A determination is made whether to forward theupdate message to children nodes, if any, in the path tree maintainedfor the source node originating the update message in response toinformation in the received update message. In one embodiment, theinformation related to the link indicates whether the update message isto be forwarded to other nodes. The link can be a wireless communicationlink.

[0011] In one embodiment, a new parent message is sent to a node, whichselects that node as a new parent node for the source node originatingthe update message. In response to the new parent message link-stateinformation associated with the source node that originated the updatemessage is received from the new parent node. The new parent message caninclude a serial number. The link-state information then received inresponse to the new parent message is associated with update messageshaving serial numbers that are greater than the serial number includedin the new parent message.

[0012] In another embodiment, a path through a new parent node for thesource node originating the update message may have the same number ofnode hops as the path through the current parent node. In this case thecurrent parent node is maintained as the parent node for the givensource node. In still another embodiment, a path to the source nodeoriginating the update message may cease to exist. In this case, thecurrent parent node is maintained as the parent node for the source nodein the event that the path to the source node is recovered.

[0013] The update message may be broadcast to the children nodes if thenumber of children nodes exceeds a predefined threshold when forwardingthe update message to children nodes. Alternatively, the update messagemay be transmitted to each child node using a unicast mode if the numberof children nodes is less than a predefined threshold when forwardingthe update message to children nodes.

[0014] In another aspect, the invention features a network comprising aplurality of nodes in communication with each other over communicationlinks. Each node maintains a path tree for each source node in thenetwork that can produce an update message. Each path tree has thatsource node as a root node, a parent node, and zero or more childrennodes. One of the nodes (i) receives an update message from the parentnode in the path tree maintained for the source node that originated thereceived update message, the update message including informationrelated to a link in the network, (ii) and determines whether to forwardthe update message to children nodes, if any, in the path treemaintained for the source node that originated the update message inresponse to the information in the received update message.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] The invention is pointed out with particularity in the appendedclaims. The objectives advantages of the invention described above, aswell as further objectives and advantages of the invention, may bebetter understood by reference to the following description taken inconjunction with the accompanying drawings, in which:

[0016]FIG. 1 is a block diagram of an embodiment of a mobileinternetworking system including a plurality of subnets in communicationwith the Internet;

[0017]FIG. 2 is a block diagram of a portion of an embodiment ofprotocol stack that can be implemented by each of the routing nodes ineach subnet to communicate in accordance with the principles of theinvention;

[0018]FIG. 3 is a flow diagram illustrating an embodiment of a processby which each routing node selects a parent neighbor node and childrenneighbor node(s) for each potential source node in the subnet to definea minimum-hop-path tree for each potential source node along whichrouting nodes receive and forward link-state updates originating fromthat source node;

[0019]FIG. 4 is a diagram illustrating an embodiment of an exemplaryminimum-hop-path tree for the nodes in the subnet of FIG. 1;

[0020]FIG. 5 is a block diagram illustrating the operation of a partialtopology version of the TBRPF protocol;

[0021]FIG. 6 is a diagram of an embodiment of a format of a messageheader for an atomic TBRPF protocol message;

[0022]FIG. 7 is a diagram of an embodiment of a format of a compoundTBRPF message;

[0023]FIGS. 8A and 8B are diagrams of embodiments of a format of a NEWPARENT protocol message;

[0024]FIG. 9 is a diagram of an embodiment of a format for a CANCELPARENT message;

[0025]FIGS. 10A and 10B are diagrams of embodiments of exemplary formatsfor link-state update messages;

[0026]FIG. 11 is a diagram of an embodiment of an exemplary format of aRETRANSMISSION_OF_BROADCAST message;

[0027]FIG. 12 is a flow diagram of an embodiment of a process performedby the nodes of the subnet to achieve neighbor discovery;

[0028]FIG. 13 is a diagram of a packet format for the protocol messagesused for neighbor discovery;

[0029] FIGS. 14 are a flow diagram of another embodiment of a processfor performing neighbor discovery;

[0030]FIG. 15A is a diagram of a format for an IPv6 address including aprefix and an interface identifier;

[0031]FIG. 15B is a diagram of an embodiment of the interface identifierincluding a 24-bit company identifier concatenated with a 40-bitextension identifier;

[0032]FIG. 15C is a diagram of an embodiment of the interface identifierincluding a 24-bit company identifier concatenated with the 40-bitextension identifier;

[0033]FIG. 15D is a diagram of an IP v6-IPv4 compatibility address;

[0034]FIG. 15E is a diagram of an embodiment of a message format fortunneling an IPv6-IPv4 compatibility address through IPv4infrastructure;

[0035]FIG. 16 is a flow diagram of an embodiment of a process by which arouter tests an IPv6-IPv4 compatibility address;

[0036]FIG. 17 is a flow diagram of an embodiment of a process by which amobile node and a server exchange messages;

[0037]FIGS. 18A and 18B are diagrams illustrating an example of theoperation of a message queue.

DESCRIPTION OF THE INVENTION

[0038]FIG. 1 shows an embodiment of an internetworking system 2including communication sub-networks (“subnets”) 10, 20 that arecomponents of a worldwide network of networks 30 (i.e., the “Internet”).The Internet 30 includes communications entities, (e.g., hosts androuters), that exchange messages according to an Internet Protocol (IP)such as IPv4 (version 4) and IPv6 (version 6). On the Internet 30,entities implementing IPv6 may coexist with IPv4 entities. In general,the IPv4 and IPv6 versions are incompatible. The incompatibility is due,in part, to the difference in addressing format: the IPv4 specifies a32-bit address format, whereas the IPv6 specifies a 128-bit addressformat.

[0039] A server 40 is connected to the Internet 30 by a wire-line orwireless connection. The server 40 can be internal or external to thesubnet 10. For purposes such as hosting application programs, deliveringinformation or Web pages, hosting databases, handling electronic mail(“e-mail”), or controlling access to other portions of the Internet 30,the server 40 is a computer system that typically handles multipleconnections to other entities (e.g., client systems) simultaneously.Although represented as a single server 40, other embodiments can have agroup of interconnected servers. The data on the server 40 arereplicated on one or more of these interconnected servers to provideredundancy in the event that a connection to the server 40 cannot beestablished.

[0040] Each subnet 10, 20 includes one or more networks that can includeboth local area network (LAN) and wide area network (WAN) components.Each subnet 10, 20 may be a freely accessible component of the publicInternet 30, or a private Intranet. The subnet 10 includes IP hosts 12,routers 14, and a gateway 16 (collectively referred to as nodes 18). Asused hereafter, a router 14 is any node 18 that forwards IP packets notexplicitly addressed to itself, and an IP host 12 is any node 18 that isnot a router 14. Examples of devices that can participate as a node 18in the subnet 10 include laptop computers, desktop computers, wirelesstelephones, and personal digital assistants (PDAs), network computers,television sets with a service such as Web TV, client computer systems,server computer systems. The gateway 16 is a particular type of routingnode 14 that connects the subnet 10 to the Internet 30. The subnet 20 issimilarly configured with nodes 18′ (i.e., hosts 12′, routers 14′, andgateways 16′).

[0041] The subnet 10 can be associated with one organization oradministrative domain, such as an Internet service provider (ISP), whichassociates each node 18 with an assigned IPv6 or IPv4 network address.Each IPv6 address is globally unique, whereas each IPv4 address islocally unique at least within the subnet 10, and may be globallyunique. Presumably, the assigned IP address has some topologicalrelevance to the “home” subnet 10 of the node 18 so that the nodes 18 ofthe subnet 10 can be collectively identified by a common address prefixfor routing purposes (called address aggregation). In one embodiment,the gateway 16 is a dual-stack node; that is, the gateway 16 has two IPaddresses, an IPv6 address and an IPv4 address, and can route packets toIPv4 and IPv6 nodes.

[0042] Although it is conceivable that all nodes 18 in subnet 10 areinitially assigned network addresses that follow a common addressconvention and have a common network prefix, dynamic topology changesmay result in nodes 18 leaving their home subnet 10 to join a “foreign”subnet (e.g., subnet 20) and new nodes joining the home subnet 10.Because the nodes 18 maintain the same IP address irrespective ofwhether the node 18 changes its location within the subnet 10 or movesto the foreign subnet 20, mobility may result in a heterogeneousconglomerate of IPv6 and IPv4 addresses, having various networkprefixes, within the single subnet 10 unless some form of dynamicaddress assignment or other address-renumbering scheme is used. Further,the gradual transition from the use of IPv4 network addresses to IPv6network addresses within the subnet 10 increases the likelihood of sucha heterogeneous conglomeration. Thus, like the Internet 30, theinfrastructure of the subnet 10 can become heterogeneous; some nodes 18can be IPv4 nodes, while others are IPv6 nodes.

[0043] In the subnet 10, each node 18 can establish connectivity withone or more other nodes 18 through broadcast or point-to-point links. Ingeneral, each link is a communication facility or medium over whichnodes 18 can communicate at the link layer (i.e., the protocol layerimmediately below the Internet Protocol layer.) Such communication linkscan be wire-line (e.g., telephone lines) or wireless; thus, nodes 18 arereferred to as wireless or wire-line depending upon the type ofcommunication link that the node 18 has to the subnet 10. Examples ofwireless communication links are microwave links, radio frequency (RF)links, infrared (IR) links, and satellite links. Protocols forestablishing link layer links include Ethernet, PPP (Point-to-PointProtocol) links, X.25, Frame Relay, or ATM (asynchronous transfer mode).Each wireless node 18, e.g., IP host A 12, has a range 22 ofcommunication within which that node 18 can establish a connection tothe subnet 10. When beyond the range 22 of communication, the IP host A12 cannot communicate with the server 40 on the Internet 30 or withother nodes 18 in the subnet 10.

[0044] Each broadcast link connecting multiple nodes 18 is mapped intomultiple point-to-point bi-directional links. For example, a pair ofnodes 18 is considered to have established a bi-directional link 18, ifeach node 18 can reliably receive messages from the other. For example,IP host A 12 and node B 14 have established a bi-directional link 24 ifand only if IP host A 12 can receive messages sent from node B 14 andnode B 14 can receive messages sent from IP host A 12 at a given instantin time. Nodes 18 that have established a bi-directional link areconsidered to be adjacent (i.e., neighboring nodes). Such abi-directional link 24 between the two nodes A and B is represented by apair of unidirectional links (A, B) and (B, A). Each link has at leastone positive cost (or metric) that can vary in time, and for any givencost, such cost for the link (A, B) may be different from that for thelink (B, A). Any technique for assigning costs to links can be used topractice the invention. For example, the cost of a link can be one, forminimum-hop routing, or the link delay plus a constant bias.

[0045] In one embodiment, the subnet 10 is a mobile “ad hoc” network(“MANET”) in that the topology of the subnet 10 and the state of thelinks (i.e., link state) between the nodes 18 in the subnet 10 canchange frequently because several of the nodes 18 are mobile. That is,each mobile node 18 may move from one location to another locationwithin the same subnet 10 or to another subnet 20, dynamically breakingexisting links and establishing new links with other nodes 18, 18′ as aresult. Such movement by one node 18 does not necessarily result inbreaking a link, but may diminish the quality of the communications withanother node 18 over that link. In this case, a cost of that link hasincreased. Movement that breaks a link may interrupt any on-goingcommunications with other nodes 18 in the subnet 10 or in the foreignsubnet 20, or with servers (e.g., server 40) connected to the Internet30. In another embodiment, the position of every node 18 in the subnet10 is fixed (i.e., a static network configuration in which no link statechanges occur due to node mobility). As the principles of the inventionapply to both static and dynamic network configurations, a reference tothe subnet 10 contemplates both types of network environments.

[0046] The following example illustrates the operation of the subnet 10.Consider, for example, that node A is communicating with the server 40over a route through subnet 10 that includes the link (A, B) to node B14, when node A 12 moves from its present location. This movement breaksthe communication link with node B 14 and, as a result, interruptscommunications with the server 40. The relocation of node A 12 may breaka link with one or more other nodes 18 as well. As one example, themovement by node A 12 may temporarily take node A 12 out ofcommunication range with node B 14, and upon returning within range,node A 12 can reestablish the broken link 24 with node B 14. In thisexample, the link 24 is intermittent. As another example, node A 12 maymove to a different location within the subnet 10 altogether andreestablish a bi-directional link 26 with a different node, (e.g., herenode H). In yet another example, node A 12 may move to the foreignsubnet 20 and establish a bi-directional link 28 with a node 14′ in thesubnet 20 (e.g., node M 14′).

[0047] Each router 14 in the subnet 10 is responsible for detecting,updating, and reporting changes in cost and up-or-down status of eachoutgoing communication link to neighbor nodes. Thus, each router 14 inthe subnet 10 runs a link-state-routing protocol for disseminatingsubnet topology and link-state information to the other routers 14 inthe subnet 10. Each router 14 also executes a neighbor discoveryprotocol for detecting the arrival of new neighbor nodes and the loss ofexisting neighbor nodes. To achieve discovery, IP hosts 12 connected tothe subnet 10 also run the neighbor discovery protocol. IP hosts 12 canalso operate as routers by running the link-state-routing protocol (inthe description, such routing IP hosts are categorically referred to asrouters 14). The link-state-routing protocol, referred to as a topologybroadcast based on reverse-path forwarding (TBRPF) protocol, seeks tosubstantially minimize the amount of update and control traffic requiredto maintain shortest (or nearly shortest) paths to all destinations inthe subnet 10.

[0048] In brief, the TBRPF protocol performed by each of the routers 14in the subnet 10 operates to inform a subset of the neighboring routers14 in the subnet 10 of the current network topology and correspondinglink-state information. Thus, for the examples above, each router 14 inthe subnet 10 that detects a change in a link to node A 12, (e.g., nodeB 14 in the cost of the link (B, A)), operates as the source (i.e.,source node) of an update. Each source node sends a message to aneighbor of that source node, informing the neighbor of the update tothat link. Each router 14 receiving the update may subsequently forwardthe update to zero or more neighbor nodes, until the change in thetopology of the subnet 10 disseminates to the appropriate routers 14 inthe subnet 10.

[0049] To transmit update messages, the TBRPF protocol supports unicasttransmissions (e.g., point-to-point or receiver directed), in which apacket reaches only a single neighbor, and broadcast transmissions, inwhich a single packet is transmitted simultaneously to all neighbornodes. In particular, the TBRPF protocol allows an update to be senteither on a common broadcast channel or on one or more unicast channels,depending on the number of neighbors that need to receive the update.

[0050] Upon recovering the same link to node B 14, or uponreestablishing a new link to another node 18 in the same subnet 10 or inthe foreign subnet 20, the node A 12 can resume the interruptedcommunications with server 40. In effect, one of the nodes 18, 18′ inthe subnet 10, 20, respectively, using the neighbor discovery protocol,discovers node A 12 and, using the TBRPF protocol, initiatesdissemination of topology and link-state information associated with thelink to node A 12. The routers 14 also use the TBRPF protocol todisseminate this information to the other routers in the respectivesubnet 10 so that one or more routes to the node A 12 become availablefor communication with the server 40.

[0051] In one embodiment, such communications resume at their point ofinterruption. In brief, node A 12 maintains, in local cache, copies ofobjects that are located on the server 40. When node A 12 and the server40 are in communication, node A 12 updates the objects as necessary,thereby maintaining substantially up-to-date copies of the objects.Thus, when node A 12 moves out of the communication range 22 with thesubnet 10, the node A 12 initially has up-to-date information. Then whennode A 12 reconnects to the subnet 10, the server 40 forwards previouslyundelivered updates to the objects locally stored at node A 12, along aroute determined by information stored at each of the routing nodes 14.In the event node A 12 reconnects to the foreign subnet 20, a hand-offprotocol, such as MobileIP, is used to achieve the redirection of themessages between the server 40 and the node A 12.

[0052] The route taken by the object updates may traverse aheterogeneous IPv6/IPv4 infrastructure. Normally, IPv6 nodes are unableto route packets to other IPv6 nodes 18 over routes that pass throughIPv4 infrastructure. In one embodiment, described in more detail below,the nodes 18 use an IPv6-IPv4 compatible aggregatable global unicastaddress format to achieve such routing. This IPv6-IPv4 compatibilityaddress format also enables incremental deployment of IPv6 nodes 18 thatdo not share a common multiple access data-link with another IPv6 node18.

[0053] Accordingly, the internetworking system 2 provides various mobilead hoc extensions to the Internet 30 that are particularly suited to thedynamic environment of mobile ad hoc networks. Such extensions, whichare described further below, include techniques for (1) disseminatingupdate information to nodes 18 in the subnet 10 using the TBRPFprotocol; (2) detecting the appearance and disappearance of new neighbornodes using a neighbor discovery protocol; (3) establishing an addressformat that facilitates deployment of IPv6 nodes in a predominantly IPv4network infrastructure; (4) updating information upon resumingcommunications between nodes; and (5) adaptively using network bandwidthto establish and maintain connections between nodes 18 and the server40.

[0054]FIG. 2 shows a portion of an embodiment of protocol stack 50 thatcan be used by each of the routing nodes 14, 14′ to communicate withother routing nodes 14 in the subnet 10, 20 and on the Internet 30, andthereby implement the various extensions to the Internet 30 describedherein. The protocol stack 50 includes a data-link layer 54, a networklayer 62, and a transport layer 70.

[0055] The data link layer 54 can implemented by any conventional datalink protocol (e.g., IEEE 802.11) with an addressing scheme thatsupports broadcast, multicast and unicast addressing with best-effort(not guaranteed) message delivery services between nodes 18 havinginstantaneous bi-directional links. For such implementations, each node18 in the subnet 10 has a unique data link layer unicast addressassignment.

[0056] The network layer 62 is the protocol layer responsible forassuring that packets arrive at their proper destination. Some of themobile ad hoc extensions for the Internet 30 described herein operate atthe network layer 62, such as the TBRPF protocol 58 and the IPv6-IPv4compatibility address format, described in more detail below.Embodiments that redirect communications from foreign subnets to homesubnets also use hand-off mechanisms such as Mobile IP, which operate atthe network layer 62. At the transport layer 70, other mobile ad hocextensions to the Internet 30 are implemented, such as techniques forupdating communications upon restoring connections between nodes and foradaptively using the network bandwidth.

[0057] 1. Topology Broadcast Based on Reverse-Path Forwarding (TBRPF)Protocol

[0058] In brief, the TBRPF protocol uses the concept of reverse-pathforwarding to broadcast each link-state update in the reverse directionalong a tree formed by the minimum-hop paths from all routing nodes 14to the source of the update. That is, each link-state update isbroadcast along the minimum-hop-path tree rooted at the source (i.e.,source node “src”) of the update. The minimum-hop-path trees (one treeper source) are updated dynamically using the topology and link-stateinformation that are received along the minimum-hop-path treesthemselves. In one embodiment, minimum-hop-path trees are used becausethey change less frequently than shortest-path trees that are determinedbased on a metric, such as delay. Other embodiments of the TBRPFprotocol can use other types of trees, such as shortest path trees, topractice the principles of the invention.

[0059] Based on the information received along the minimum-hop-pathtrees, each node 18 in the subnet 10 computes a parent node and childrennodes, if any, for the minimum-hop-path tree rooted at each source nodesrc. Each routing node 14 may receive and forward updates originatingfrom a source node src along the minimum-hop-path tree rooted at thatsource node src. Each routing node 14 in the subnet 10 also engages inneighbor discovery to detect new neighbor nodes and the loss of existingneighbor nodes. Consequently, the routing node 14 may become the sourceof an update and thus may generate an update message. When forwardingdata packets to a destination node, each routing node 14 selects thenext node on a route to the destination.

[0060] To communicate according to the TBRPF protocol, each routing node14 (or node i, when referred to generally) in the subnet 10 stores thefollowing information:

[0061] 1. A topology table, denoted TT_i, consisting of all link-statesstored at node i. The entry for link (u, v) in this table is denotedTT_i(u, v) and includes the most recent update (u, v, c, sn) receivedfor link (u, v). The component c represents the cost associated with thelink, and the component sn is a serial number for identifying the mostrecent update affecting link (u, v) received by the node i. Thecomponents c and sn of the entry for the link (u, v) is denoted TT_i(u,v).c and TT_i(u, v).sn. Optionally, the dissemination of multiple linkmetrics is attainable by replacing the single cost c with a vector ofmultiple metrics.

[0062] 2. A list of neighbor nodes, denoted N_i.

[0063] 3. For each node u other than node i, the following ismaintained:

[0064] a. The parent, denoted p_i(u), which is the neighbor node (“nbr”)of node i that is the next node on a minimum-hop path from node i tonode u, as obtained from the topology table TT_i.

[0065] b. A list of children nodes of node i, denoted children_i(u).

[0066] c. The sequence number of the most recent link-state updateoriginating from node u received by node i, denoted sn_i(u). Thesequence number is included in the link-state update message. The use ofsequence numbers helps achieve reliability despite topology changes,because node i avoids sending another node information that the othernode already has. Each node i maintains a counter (i.e., the sequencenumber) for each link that the node i monitors. That counter isincremented each time the status of the link changes.

[0067] d. The routing table entry for node u, consisting of the nextnode on a preferred path to node u. The routing table entry for node ucan be equal to the parent p_i(u) if minimum-hop routing is used fordata packets. However, in general, the routing table entry for node u isnot p_i(u), because the selection of routes for data traffic can bebased on any objective.

[0068] One embodiment of the TBRPF protocol uses the following messagetypes:

[0069] LINK-STATE UPDATE: A message containing one or more link-stateupdates (u, v, c, sn).

[0070] NEW PARENT: A message informing a neighbor node that the node hasselected that neighbor node to be a parent with respect to one or moresources of updates.

[0071] CANCEL PARENT: A message informing a neighbor that it is nolonger a parent with respect to one or more sources of updates.

[0072] HELLO: A message sent periodically by each node i for neighbordiscovery.

[0073] NEIGHBOR: A message sent in response to a HELLO message.

[0074] NEIGHBOR ACK: A message sent in response to a NEIGHBOR message.

[0075] ACK: A link-level acknowledgment to a unicast transmission.

[0076] NACK: A link-level negative acknowledgment reporting that one ormore update messages sent on the broadcast channel were not received.

[0077] RETRANSMISSION OF BROADCAST: A retransmission, on a unicastchannel, of link-state updates belonging to an update message for whicha NACK message was received.

[0078] HEARTBEAT: A message sent periodically on the broadcast channelwhen there are no updates to be sent on this channel, used to achievereliable link-level broadcast of update messages based on NACKs.

[0079] END OF BROADCAST: A message sent to a neighbor over a unicastchannel, to report that updates originating from one or more sources arenow being sent on the unicast channel instead of the broadcast channel.

[0080] The formats for the various types of TBRPF protocol messages aredescribed below.

[0081] Building the Minimum-Hop-Path Tree for a Source

[0082]FIG. 3 shows an embodiment of a process by which each routing node14 selects a parent neighbor node and children neighbor node(s) for eachpotential source node src in the subnet 10. The selection of the parentand children neighbor nodes for each potential source node src define aminimum-hop-path tree for that potential source node along which therouting nodes 14 receive and forward link-state updates originating fromthat source node src. Pseudo-code describing the network-levelprocedures performed by each routing node 14 is in Appendix A.

[0083] Node i receives (step 90) a message over a communication link.The received message can represent a link-state update, the discovery ofa new neighbor node, the loss of a neighbor node, a change in the costof a link to a neighbor node, a selection of a new parent neighbor node,or a cancellation of a parent neighbor node. Pseudo-code for processingthese types of received messages is provided in Appendix A; thecorresponding procedures are called Process_Update, Link_Up, Link_Down,Link_Change, Process_New_Parent, and Process_Cancel_Parent,respectively. The general procedure followed in response to all of theseevents, and the specific procedure followed by a node that has juststarted and has no topology information are described below.

[0084] If node i receives a message representing a link-state update,the discovery of a new neighbor node, the loss of a neighbor node, or achange in the cost of a link to a neighbor node, node i enters (step100) the new link-state information, if any into the topology table,TT_i, and forwards (step 102) the link-state information in a link-stateupdate to the neighbor nodes in children_i(src), where src is the sourcenode at which the update originated. Node i then computes (step 104) theparent nodes p_i(u) for all potential source nodes src by running ashortest-path algorithm such as Dijkstra's algorithm. If thiscomputation results in a change to the parent node p_i(u) for any sourceu, node i then sends a NEW PARENT(u, sn) message, where sn=sn_i(u), tothe new parent node p_i(u) and a CANCEL PARENT message to the old parentnode (step 106).

[0085] If node i receives (step 90) a NEW PARENT(u, sn) message from asending node with source u and sequence number sn, node i adds(step 108)the sending node to node i's list of children nodes children_i(u) forthat source u, and then sends (step 110) the sending node a LINK-STATEUPDATE message containing all updates in node i's topology table, TT_i,originating from source u and having a sequence number greater than sn.If node i receives (step 90) a CANCEL PARENT(u) message from a sendingnode with source u, node i removes (step 112) the sending node from nodei's list of children nodes children_i(u) for that source u.

[0086] Consider, for example, the case in which node i initially has notopology information. Accordingly, node i has no links to neighbornodes, and its topology table TT_i is empty. Also the parent node isp_i(src)=NULL (i.e., not defined), the children_i(src) is the empty set,and sn_i(src)=0 for each source node src. Upon receiving (step 90)messages representing the discovery of neighbor nodes, node i executesthe Link_Up procedure to process each link established with eachneighbor node nbr. Because each neighbor node nbr of node i is(trivially) the next node on the minimum-hop path from node i toneighbor node nbr, node i selects (step 104) each of its neighbor nodesnbr as the new parent node p_i(nbr) for source node nbr. Execution ofthe Link-Up procedure results in node i sending (step 106) a NEW PARENTmessage to each neighbor node nbr. Therefore, the NEW PARENT messagesent to a new neighbor node nbr contains the neighbor node nbr (andpossibly other sources) in its source list.

[0087] In response to the NEW PARENT message, then each neighbor nodenbr informs (step 110) node i of the outgoing links of neighbor nodenbr. Information about the outgoing links of neighbor node nbr allowsnode i to compute minimum-hop paths to the nodes at the other end of theoutgoing links, and thus to compute (step 104) new parents p_i(src), forall source nodes src that are two hops away. Node i sends (step 106) aNEW PARENT message to each of these computed new parents. Then eachparent p_i(src) for each such source node src informs (step 110) node iof the outgoing links for source node src, which allows node i tocompute (step 104) new parents for all source nodes that are three hopsaway. This process continues until node i has computed parent nodes forall sources nodes src in the subnet 10. As a result, for a given sourcenode src, the parents p_i(src) for all nodes i other than source nodesrc define a minimum hop-path tree rooted at source node src (after theprotocol has converged).

[0088] Node i cancels an existing parent p_i(src) by sending a CANCELPARENT(src) message containing the identity of the source node src.Consequently, the set of children, children_i(src), at node i withrespect to source node src is the set of neighbor nodes from which nodei has received a NEW PARENT message containing the identity of sourcenode src without receiving a subsequent CANCEL PARENT message for thatsource node src. Node i can also simultaneously select a neighbor nodeas the parent for multiple sources, so that the node i sends a NEWPARENT(src_list, sn_list) message to the new parent, where src_list isthe list of source nodes and sn_list is the corresponding list ofsequence numbers. Similarly, a CANCEL PARENT message can contain a listof sources.

[0089] In one embodiment, the TBRPF does not use NEW PARENT and CANCELPARENT messages in the generation the minimum-hop-path tree. Instead,each node i computes the minimum-hop paths from each neighbor node nbrto all destinations (e.g., by using breadth-first search or Dijkstra'sshortest-path algorithm). Consequently, each node i computes the parentsp_nbr(src) for each neighbor node nbr and source node src, from whichnode i determines which neighbor nodes nbr are its children for thegiven source node src. Although this process eliminates NEW PARENT andCANCEL PARENT messages, the process also requires that each node i (1)sends all updates originating from the source node src to any child nodein children_i(src), or (2) periodically sends updates along theminimum-hop-path tree, because node i does not know the sequence numbersn_nbr(src) from the neighbor node nbr and thus does not know whatupdates the neighbor node nbr already has. Either of these actionsensures that each neighbor node nbr receives the most recent informationfor each link.

[0090]FIG. 4 shows an embodiment of an exemplary minimum-hop-path tree120 for the nodes 18 in the subnet 10 of FIG. 1. For the sake ofillustration, assume that node D is the source of an update. The parent122 for nodes C, G, and H with respect to the source node D is node D;the parent 124 for node F with respect to source node D is node H; theparent 126 for nodes A and B with respect to source node D is node F;the parent 128 for node E is node B; and the parent 130 for node L isnode A. (In this example, node A is a routing node 14, and thus runs theTBRPF protocol.)

[0091] Conversely, the children 132 of node D are nodes C, G, and H; thechild 134 of node H is node F; the children 136 of node F are nodes Aand B; the child 138 of node B is node E, and the child 140 of node A isnode L. As shown, nodes C, E, G, and L are leaf nodes, which, inaccordance with the TBRPF protocol, do not have to forward updatesoriginating from the source node D.

[0092] Updating the Minimum-Hop-Path Tree

[0093] In brief, the TBRPF protocol disseminates link-state updatesgenerated by a source node src along the minimum-hop-path tree rooted atnode src and dynamically updates the minimum-hop-path tree based on thetopology and link-state information received along the minimum-hop-pathtree. More specifically, whenever the topology table TT_i of node ichanges, the node i computes its parent p_i(src) with respect to everysource node src (see the procedure Update_Parents in Appendix A). Thenode i computes parents by (1) computing minimum-hop paths to all othernodes using, for example, Dijkstra's algorithm, and (2) selecting thenext node on the minimum-hop path to each source node src to be theparent for that source node src (see the procedure Compute_New_Parentsin Appendix A). The computation of parents occurs when the node ireceives a topology update, establishes a link to a new neighbor node,or detects a failure or change in cost of a link to an existing neighbornode.

[0094] In one embodiment, node i computes a new parent p_i(src) for agiven source node src even though the path to the source node srcthrough the new parent has the same number of hops as the path to thesource node src through the old parent. In another embodiment, the nodekeeps the old parent node in this event, thus reducing the overhead ofthe TBRPF protocol. This embodiment can be implemented, for example, byusing the procedure Compute_New_Parents2 (given in Appendix A) insteadof the procedure Compute_New_Parents.

[0095] If the parent p_i(src) changes, node i sends the message CANCELPARENT(src) to the current (i.e., old) parent, if the old parent exists.Upon receiving the CANCEL PARENT(src) message, the old parent (“k”)removes node i from the list children_k(src).

[0096] Node i also sends the message NEW PARENT(src, sn) to the newlycomputed parent if the new parent exists, where sn=sn_i(src) is thesequence number of the most recent link-state update originating fromsource node src received by node i. This sequence number indicates the“position” up to which node i has received updates from the old parent,and indicates to the new parent that it should send only those updatesthat occurred subsequently (i.e., after that sequence number).

[0097] Upon receiving the NEW PARENT(src, sn) message, the new parent“j” for p_i(src) adds node i to the list children_j(src) and sends tonode i a link-state update message consisting of all the link statesoriginating from source node src in its topology table that have asequence number greater than sn (see the procedure Process_New_Parent inAppendix A). Thus, only updates not yet known to node i are sent to nodei.

[0098] Generally, the range of sequence numbers is large enough so thatwraparound does not occur. However, if a small sequence number range isused, wraparound can be handled by employing infrequent periodic updateswith a period that is less than half the minimum wraparound period, andby using a cyclic comparison of sequence numbers. That is, sn isconsidered less than sn′ if either sn is less than sn′ and thedifference between sn and sn′ (sn′−sn′) is less than half the largestpossible sequence number, or sn′ is less than sn and the difference,sn−sn′, is greater than half the largest possible sequence number.

[0099] When a node i detects the existence of a new neighbor nbr, itexecutes Link_Up(i, nbr) to process this newly established link. Thelink cost and sequence number fields for this link in the topology tableat node i are updated. Then, the corresponding link-state message issent to all neighbors in children_i(i). As noted above, node i alsorecomputes its parent node p_i(src) for every node src, in response tothis topological change. In a similar manner, when node i detects theloss of connectivity to an existing neighbor node nbr, node i executesLink_Down(i, nbr). Link_Change(i, nbr) is likewise executed at node i inresponse to a change in the cost to an existing neighbor node nbr.However, this procedure does not recompute parents.

[0100] In one embodiment, if a path between the node i and a givensource node src ceases to exist, the node i computes a new parentp_i(src) that is set to NULL (i.e., parent does not exist). In anotherembodiment, although the path between the node i and the given sourcenode src ceases to exist, the node i keeps the current parent, if thecurrent parent is still a neighbor node of the node i. Thus, theoverhead of the TBRPF protocol is reduced because it is unnecessary tosend a CANCEL PARENT and a subsequent NEW PARENT messages if the oldpath to the source becomes operational later because of a link recovery.This embodiment can be implemented by replacing the fifth line of theprocedure Update_Parents in Appendix A, “If (new_p_i(src)!=p_i(src)){”,with the line “If (new_p_i(src)!=p_i(src) and new_p_i(src)!=NULL){”.

[0101] The TBRPF protocol does not use an age field in link-state updatemessages. However, failed links (represented by an infinite cost) andlinks that are unreachable (i.e., links (u, v) such that p_i(u)=NULL)are deleted from the topology table TT_i after MAX_AGE seconds (e.g., 1hour) in order to conserve memory. Failed links (u, v) are maintainedfor some time in the topology table TT_i, rather than deletedimmediately, to ensure that the node i that changes its parent p_i(u)near the time of failure (or had no parent p_i(u) during the failure) isinformed of the failure by the new parent.

[0102] Unreachable links, (i.e., links (u, v) such that node i and nodeu are on different sides of a network partition), are maintained for aperiod of time to avoid having to rebroadcast the old link state for (u,v) throughout node i's side of the partition, if the network partitionsoon recovers, which can often happen if the network partition is causedby a marginal link that oscillates between the up and down states. If alink recovers resulting in the reconnection of two network componentsthat were disconnected (i.e., partitioned) prior to the link recovery,the routing nodes 14 in one partition my temporarily have invalid routesto nodes 18 in the other partition. This occurs because the routingnodes 14 may receive an update message for the link recovery beforereceiving update messages for links in the other partition.Consequently, the link-information for those links in the otherpartition may be outdated temporarily.

[0103] To correct this situation, in one embodiment, a header field isadded to each link-state update message, which indicates whether theupdate message is sent in response to a NEW PARENT message. The headerfield also identifies the corresponding NEW PARENT message using asequence number. For example, if a given node i sends a NEW PARENTmessage (for multiple sources) to node j following the recovery of thelink (i, j), the node i waits for a response from node j to the NEWPARENT message before sending to node i's neighbor nodes an updatemessage corresponding to the link recovery. The response from node jincludes the link-state information of the other nodes 18 in thepreviously disconnected partition. Then node i forwards this link-stateinformation to node i's neighbor nodes. Consequently, the nodes 18 inthe same partition as node i receives updates for the links in the otherpartition at the same time that the nodes 18 receive the update for thelink recovery. Thus, the link-state information for those links in theother partition is not outdated temporarily.

[0104] A node i that is turned off (or goes to sleep) operates as if thelinks to all neighbors have gone down. Thus, the node i remembers thelink-state information that it had when turned off. Since all such linksare either down or unreachable, these link states are deleted from thetopology table TT_i if the node i awakens after being in sleep mode formore than MAX_AGE seconds.

[0105] Infrequent periodic updates occur to correct errors that mayappear in table entries or update messages. (See Send_Periodic_Updatesin Appendix A.) As discussed above, periodic updates are also useful ifthe sequence number range is not large enough to avoid wraparound.

[0106] Initiating an Update Message

[0107] When a given routing node 14 detects a change in the state of aneighbor node, that routing node 14 becomes the source (i.e., sourcenode src) of a link-state update message with respect to correspondinglink to that neighbor node. As described above, the source node src thenbroadcasts each link-state update along the minimum-hop-path tree rootedat the source of the update.

[0108] A link-state update message reports the state of the link (src,nbr) as a tuple (src, nbr, c, sn), where c and sn are the cost and thesequence number associated with the update. A cost of infinityrepresents a failed link. The source node src is the head node of link(src, nbr), and is the only node that can report changes to parametersof link (src, nbr). Therefore, any node 18 receiving the link-stateupdate (src, nbr, c, sn) can determine that the update originated fromthe source node src.

[0109] The source node src maintains a counter sn_src, which isincremented by at least one each time the cost of one or more outgoinglinks (src, nbr) changes value. For example, the counter sn_src can be atime stamp that represents the number of seconds (or other units oftime) elapsed from some fixed time. When the source node src generates alink-state update (src, nbr, c, sn), the sequence number sn is set tothe current value of sn_src.

[0110] Receiving an Update Message

[0111] In brief, each routing node 14 that receives a link-state updatemessage receives that update message along a single path. That is, anylink-state update originating from source node src is accepted by node iif (1) the link-state update is received from the parent node p_i(src),and (2) the link-state update has a larger sequence number than thecorresponding link-state entry in the topology table TT_i at node i. Ifthe link-state update is accepted, node i enters the link-state updateinto the topology table TT_i. Node i may then forward the link-stateupdate to zero or more children nodes in children_i(src). In oneembodiment, the link-state update passes to every child node inchildren_i(src). (See the procedures Update Topology Table and ProcessUpdate in the Appendix A.)

[0112] Forwarding Update Messages

[0113] In most link-state routing protocols, e.g., OSPF (Opens ShortestPath First), each routing node 18 forwards the same link-stateinformation to all neighbor nodes. In contrast, in one embodiment of theTBRPF protocol, each routing node 14 sends each link-state update onlyto neighbor nodes that are children on the minimum-hop-path tree rootedat the source of the update. Each routing node 14 having no children forthe source node src of the link-state update is a leaf in theminimum-hop-path tree and therefore does not forward updates originatingfrom the source node src. In typical networks, most nodes 18 are leaves,thus the TBRPF protocol makes efficient use of the bandwidth of thesubnet 10. In addition, those nodes having only one child node for thesource node src can send updates generated by the source node src tothat child node only, instead of broadcasting the updates to allneighbor nodes.

[0114] The TBRPF protocol may utilize bandwidth more efficiently byusing unicast transmissions if those routing nodes 14 have only onechild, or a few children, for the source of the update, and broadcasttransmissions when several children exist for the update. Therefore, inone embodiment, the TBRPF protocol determines whether to use unicast orbroadcast transmissions, depending on the number of children nodes andthe total number of neighbor nodes.

[0115] In general, each routing node 14 uses unicast transmissions forupdates with only one intended receiver (e.g., only one child), andbroadcast transmissions for updates with several intended receivers, toavoid transmitting the update message several times. Therefore, eachrouting node 14 uses unicast transmission if k−1 and use broadcast ifk>1, where k is the number of intended receivers. A possible drawbackcan occur if the number of children nodes exceeds one and there are amany more neighbors. For example, if there are two children nodes andtwenty neighbor nodes, (i.e., k=2 and n=20, where k is the number ofchildren nodes and n is the number of neighbors), then 18 neighbor nodesare listening to a message not intended for them. Such neighbor nodescould instead be sending or receiving other messages.

[0116] To avoid this possible drawback, one option is to use broadcasttransmission if k>(n+1)/2 and unicast transmission in all other cases.In general, a rule of the form k>g(n) can be used. For update messages,the number of children k may be different for different update sources.Therefore, it is possible to use unicast transmissions for some sourcesand broadcast transmissions for other sources, and the transmission modefor a given source u, denoted mode_i(u), can change dynamically betweenunicast and broadcast as the number of children changes.

[0117] While LINK-STATE-UPDATE messages can be transmitted in eitherunicast or broadcast mode, HELLO messages and HEARTBEAT messages(discussed below) are always transmitted on the broadcast channel, andthe following messages are always transmitted on the unicast channel (toa single neighbor): NEIGHBOR, NEIGHBOR ACK, ACK, NACK, NEW PARENT,CANCEL PARENT, RETRANSMISSION OF BROADCAST, END OF BROADCAST, andLINK-STATE-UPDATE messages sent in response to a NEW PARENT message.

[0118] Exemplary pseudo-code for a procedure for sending a LINK-STATEUPDATE message (that is not a response to a NEW PARENT message) on thebroadcast or unicast channel is as follows: If (mode_i(src) = =BROADCAST) Append the message update_msg to the message queue associatedwith the broadcast channel. If (mode_i(src) = = UNICAST) For (each nodek in children_i(src)) Append the message update_msg to the message queueassociated with the unicast channel to node k.

[0119] Reliable unicast transmission of control packets can be achievedby a variety of reliable link-layer unicast transmission protocols thatuse sequence numbers and ACKs, and that retransmit a packet if an ACK isnot received for that packet within a specified amount of time.

[0120] Reliable Transmission in Broadcast Mode

[0121] For reliable transmission of Link-State Update messages inbroadcast mode, each broadcast update message includes one or morelink-state updates, denoted lsu(src), originating from sources src forwhich the transmission mode is BROADCAST. Each broadcast control packetis identified by a sequence number that is incremented each time a newbroadcast control packet is transmitted. Reliable transmission ofbroadcast control packets in TBRPF can be accomplished using either ACKsor NACKs. If ACKs are used, then the packet is retransmitted after aspecified amount of time if an ACK has not been received from eachneighbor node that must receive the message.

[0122] In one embodiment of TBRPF, NACKs are used instead of ACKs forreliable transmission of broadcast control packets, so that the amountof ACK/NACK traffic is minimized if most transmissions are successful.Suppose node i receives a NACK from a neighbor node nbr for a broadcastupdate message. In one embodiment, all updates lsu(src) in the originalmessage, for each source node src such that neighbor node nbr belongs tochildren_i(src), are retransmitted (reliably) on the UNICAST channel tothe neighbor node nbr, in a RETRANSMISSION OF BROADCAST message. Thismessage includes the original broadcast sequence number to allowneighbor node nbr to process the updates in the correct order. Inanother embodiment, such update messages are retransmitted on thebroadcast channel. This embodiment may improve the efficiency of theTBRPF protocol in subnets that do not support receiver-directedtransmission, because in such subnets unicast transmission provides noefficiency advantage over broadcast transmissions.

[0123] The procedure for the reliable transmission of broadcast updatepackets uses the following message types (in addition to LINK-STATEUPDATE messages): HEARTBEAT(sn), NACK(sn, bit_map), and RETRANSMISSIONOF BROADCAST(sn, update_msg). A NACK(sn, bit_map) message contains thesequence number (sn) of the last received broadcast control packet, anda 16-bit vector (bit_map) specifying which of the 16 broadcast controlpackets from sn−15 to sn have been successfully received.

[0124] A description of the procedure for the reliable transmission ofbroadcast update packets at node i uses the following exemplarynotation:

[0125] Pkt(sn) represents a control packet with sequence number sntransmitted on the broadcast channel by node i.

[0126] MsgQ represents a message queue for new control messages to besent on the broadcast channel from node i.

[0127] brdcst_sn_i represents the sequence number of the last packettransmitted on the broadcast channel by node i.

[0128] Heartbeat_Timer represents a timer used in the transmission ofthe HEARTBEAT message.

[0129] Following the transmission of the broadcast control packetPkt(brdcst_sn_i) on the broadcast channel, node i increments brdcst_sn_iand reinitializes Heartbeat_Timer. When Heartbeat_Timer expires at nodei, the node i appends the control message HEARTBEAT(brdcst_sn_i) to themessage queue associated with the broadcast channel, and reinitializesHeartbeat_Timer. When the node i receives NACK(sn, bit_map) fromneighbor node nbr, node i performs the functions as illustrated byfollowing exemplary pseudo-code: For each (sn' not received as indicatedby bit_map){ Let update_msg = {(src*, v*, sn*, c*) in Pkt(sn') such thatthe neighbor node nbr is in children_i(src*)}. Append the messageRETRANSMISSION OF BROADCAST(sn', update_msg) to the message queueassociated with the unicast channel to neighbor node nbr. (Message mustbe sent even if update_msg is empty.)}

[0130] Upon receipt at neighbor node nbr of control packet Pkt(sn)transmitted on the broadcast channel by node i, the neighbor node nbrperforms the following operations as illustrated by the followingpseudo-code: If the control packet Pkt(sn) is received in error{ Appendthe control message NACK(sn, bit_map) to the message queue associatedwith the unicast channel to node i.} If the control packet Pkt(sn) isreceived out of order (i.e., at least one previous sequence number isskipped){ Withhold the processing of the control packet Pkt(sn). Appendthe control message NACK(sn, bit_map′) to the message queue associatedwith the unicast channel to node i.} Else (control packet Pkt(sn) isreceived correctly and in order){ For each Link-State Update messageupdate_msg in Pkt(sn), call Process_Update(i, nbr, update_msg).}

[0131] When a communication link is established from node i to a newneighbor nbr, in one embodiment the node i obtains the current value ofbrdcst_sn_nbr from the NEIGHBOR message or NEIGHBOR ACK that wasreceived from neighbor node nbr.

[0132] Each node i can dynamically select the transmission mode forlink-state updates originating from each source node src. As describedabove, this decision uses a rule of the form k>g(n), where k is thenumber of children (for src) and n is the number of neighbors of node i.However, to ensure that updates are received in the correct order, orthat the receiver has enough information to reorder the updates, node isends an END OF BROADCAST(last_seq_no, src) message on the unicastchannel to each child when the mode changes to UNICAST, and waits forall update packets sent on unicast channels to be ACKed on beforechanging to BROADCAST mode.

[0133] To facilitate this process, each node i maintains a binaryvariable unacked_i(nbr, src) for each neighbor node nbr and source nodesrc, indicating whether there are any unACKed control packets sent toneighbor node nbr containing link-state updates originating at sourcenode src. The following exemplary pseudo-code illustrates an embodimentof a procedure that is executed periodically at each node i. For each(node src){ If (mode_i(src) = BROADCAST and|children_i(src)| <= g(n)){For each (node nbr in children_i(src)){ Append the message END OFBROADCAST(brdcst_sn_i, src) to the message queue associated with theunicast channel to node nbr.} Set mode_i(src) = UNICAST.} If(mode_i(src) = UNICAST and |children_i(src)| > g(n)){ Set switch_flag =YES. For each (node nbr in children_i(src)){ If (unacked_i(nbr, src) =YES) Set switch_flag = NO.} If (switch_flag = YES) Set mode_i(src) =BROADCAST.}}

[0134] Full and Partial Topology TBRPF

[0135] In one embodiment, a result of the running the TBRPF protocol isthat each router 14 in the subnet 10 obtains the state of each link inthe subnet 10 (or within a cluster if hierarchical routing is used).Accordingly, this embodiment of the TBRPF protocol is referred to asfull-topology link-state protocol. In some embodiments, described below,the TBRPF protocol is a partial-topology link-state protocol in thateach router 14 maintains a subset of the communication links in thesubnet 10. In the full-topology protocol embodiment, each routing node14 is provided with the state of each link in the subnet 10 (or cluster,if hierarchical routing is used). In other embodiments, the TBRPF is apartial topology protocol in that each routing node 14 is provided withonly a subset of the links in the subnet 10.

[0136] For the full-topology link-state protocol embodiment (1)alternate paths and disjoint paths are immediately available, allowingfaster recovery from failures and topology changes; and (2) paths can becomputed subject to any combination of quality-of-service (QoS)constraints and objectives. Partial-topology link-state protocolsprovide each node 18 with sufficient topology information to compute atleast one path to each destination. Whether implemented as afull-topology or as a partial-topology protocol, the TBRPF protocol is aproactive link-state protocol in that each node 18 dynamically reacts tolink-state and topology changes and maintains a path to each possibledestination in the subnet 10 at all times.

[0137] A Partial-Topology Embodiment

[0138] In one partial-topology embodiment, each routing node 14 decideswhich of its outgoing links (i, j), called “special links,” should bedisseminated to all nodes in the subnet 10. This subset of links ismaintained in a list L_i. All other outgoing links are sent only one hop(i.e., to all neighbor nodes of node i). Node i sends an update to itsneighbor nodes if that update is the addition or removal of a link fromthe list L_i, or reflects a change in the state of a link in the listL_i.

[0139] Various rules can be used to define the set of special links inthe list L_i. For example, one rule defines a link (i, j) to be in L_ionly if node j is the parent of node i for some source node other thannode j, or if node j belongs to the set children_i(src) for some sourcenode src other than node i. This definition of special links includesenough links to provide minimum-hop paths between any pair of nodes. Asa result, this partial-topology embodiment reduces the amount of controltraffic without reducing the quality of the routes. In this embodiment,an update (u, v, c, sn, sp) is augmented to include a, “sp” field (e.g.,a single-bit field), which indicates whether the link (u, v) is aspecial link. Pseudo-code representing an exemplary implementation ofthe partial-topology embodiment appears in the Appendix A, after the“Partial-Topology 1” header. The procedure Mark_Special_Links(i) iscalled upon a change to the parent p_i(src) or to the set of childrennodes children_i(src).

[0140] A Second Partial-Topology Embodiment

[0141] In another partial-topology embodiment, each routing node 14,hereafter node i, maintains a topology table TT_i, a source tree Ti(i.e., computed paths to all destinations), a set of reported links Ri,and a set of neighbor nodes Ni. The entry of TT_i for a link (u, v) isdenoted TT_i(u,v) and consists of the tuple (u, v, c, c′), where c isthe cost associated with the link and c′ is the last cost reported toneighbor nodes for the link. The component c of the entry for link (u,v) is denoted TT_i(u, v).c. In addition, a parent p_i(u) and set ofchildren nodes children_i(u) are maintained for each node u≠node i. Theparent p_i(u) is the next node on a shortest path to node u, based onthe information in TT_i. The source tree Ti, computed by a lexicographicversion of Dijkstra's algorithm, is the set of links that belong to atleast one of the computed paths. The set of reported links Ri includesthe source tree Ti and any link in TT_i for which an update has beensent but a delete update has not since been sent. In addition, a binaryvariable pending_i(u) is maintained for each node u≠node i, whichindicates that the parent p_i(u) is pending, i.e., that a NEW PARENT(u)message has been sent to p_i(u) but no response has yet been received.In general, each node i reports to neighbor nodes the current states ofonly those links in its source tree Ti, but sends only part of itssource tree Ti to each neighbor node such that no node receives the sameinformation from more than one neighbor node. Pseudo-code representingan exemplary implementation of this partial-topology embodiment of theTBRPF protocol appears in the Appendix A, after the “Partial-Topology 2”header.

[0142] Upon receiving an update message, consisting of one or moreupdates (u, v, c), node i executes the procedure Update( ), which callsthe procedure Update_Topology_Table( ), then executes the procedureLex_Dijkstra( ) to compute the new source tree Ti and the procedureGenerate_Updates( ) to generate updates and modify the set of reportedlinks Ri based on changes in link costs and changes to the source treeTi. Each generated update is then sent to the appropriate children, thatis, updates for links with head u are sent to children_i(u). Theprocedure Update_Parents( ) is called, which determines any changes inthe parent assignment and sends NEW PARENT and CANCEL PARENT messages.

[0143] The sending of updates can be accomplished in different ways,depending on whether the subnet 10 consists of point-to-point links,broadcast links, or a combination of both link types. In a network ofpoint-to-point links, each neighbor node k would be sent a message thatcontains the updates for links (u, v) such that k belongs tochildren_i(u). If a broadcast capability also exists, links (u, v) forwhich children_i(u) has more than one member can be broadcast to allneighbor nodes.

[0144] The procedure Update_Topology_Table( ) does the following foreach update (u, v, c) in the input message (in_message) such that theparent p_i(u) is the neighbor node who sent the message. (Updatesreceived from a node other than the parent are ignored.) If either TT_idoes not contain an entry for (u, v) or contains an entry with adifferent cost than c, then TT_i(u, v) is updated with the new value cand link (u, v) is marked as changed. If the input message is a PARENTRESPONSE, then in addition to updates, the message contains the samelist of sources as the NEW PARENT message to which it is responding. Foreach such source node u such that pending_i(u)=1 and for each link (u,v) in TT_i that is outgoing from source node u but for which the inputmessage does not contain an update, the cost of (u, v) is set toinfinity, to indicate that the link should be deleted. In other words,any link that was reported by the old parent but is not reported by thenew parent is deleted. Only information from the current parent isconsidered valid.

[0145] The procedure Lex_Dijkstra( ) (not included in Appendix A) is animplementation of Dijkstra's algorithm that computes thelexicographically smallest shortest path LSP(i, u) from node i to eachnode u, using as path name the sequence of nodes in the path in thereverse direction. For example, the next-to-last node of LSP(i, u) hasthe smallest node ID among all possible choices for the next-to-lastnode. Such paths are computed using a modification of Dijkstra'salgorithm in which, if there are multiple choices for the next node tolabel, the one with the smallest ID is chosen.

[0146] The procedure Generate_Updates( ) decides what updates to includein the message to be sent to neighbor nodes. A non-delete update isincluded for any link (u, v) that is in the new source tree Ti andeither is marked as changed or was not in the previous source tree(denoted old source tree Ti). In this case, Ti(u, v).c′ is set to Ti(u,v).c, and (u, v) is added to the reported link set Ri if not already inthe reported link set Ri. A delete update is included for any link (u,v) that is in the reported link set Ri but is not in the source tree Ti,such that TT_i(u, v).c>TT_i(u, v).c′. In this case, (u, v) is removedfrom the reported link set Ri. Any links with infinite cost are erasedfrom the topology table TT_i.

[0147] The procedure Update_Parents( ) sets the new parent p_i(u) foreach source node u to be the second node on the shortest path to node u.If there is no path to node u, p_i(u) is null. If the new parent isdifferent from the old parent, then a NEW PARENT message is sent to thenew parent (if it is not null) and a CANCEL PARENT message is sent tothe old parent (if it is not null and the link to the old parent isstill up). The NEW PARENT messages for all source nodes u having thesame new parent are combined into a single message, and CANCEL PARENTmessages are similarly combined.

[0148] The procedure Process_New_Parent( ) is executed when a NEW PARENTmessage is received from some neighbor node. For each source node u inthe NEW PARENT message, the procedure adds the neighbor node tochildren_i(u) and includes in the PARENT RESPONSE message an update foreach link (u, v) in the source tree Ti whose head is source node u, ifsuch a link exists. (Such a link will not exist if node u is a leaf ofsource tree Ti.) As described above, the PARENT RESPONSE also includesthe same list of sources as the NEW PARENT message to which it isresponding. (This list is not necessary if the node sending the NEWPARENT message remembers the list and can match the PARENT RESPONSE tothe NEW PARENT message.)

[0149] When the cost of a link to a neighbor node j changes, node i setsTT_i(i, j).c to the new cost and calls the procedure Update( ) with k=iand an empty input message. A threshold rule can be used so that TT_i(i,j).c is updated only if the percent difference between the new cost andthe old cost is at least some given threshold. If a link to a neighbornode j fails, the same procedure is followed (with the cost changing toinfinity), and node j is removed from set of neighbor nodes Ni.

[0150] When a link to a neighbor node j comes up, either initially orupon recovering from a failure, node i executes the procedure Link_Up(I,j), which adds neighbor node j to the set of neighbor nodes Ni, setsTT_i(i, j).c to the link cost, and calls the procedure Update( ) withk=i and an empty input message. This may result in a NEW PARENT messagebeing sent to neighbor node j.

[0151] To correct errors that may appear in TT_i due to noisytransmissions or memory errors, each node i can periodically generateupdates for its outgoing links. Since a received update is ignoredunless it has a cost that differs from the entry in the topology tableTT_i, the cost of the periodic update should be chosen to be slightlydifferent from the previous update. Alternatively, each update cancontain an additional bit b, which toggles with each periodic update.

[0152]FIG. 5 illustrates the operation of the second partial-topologyembodiment of the TBRPF protocol when a communication link 142 betweennodes B and D in the subnet 10 fails. The minimum-hop-path tree forsource node B before the link failure is shown with solid arrows; theminimum-hop-path tree for source node C is shown with dashed arrows. Asshown node A selects node B as parent for source nodes B, D, and F, andselects node C as parent for source nodes C, E, and F. Therefore, node Breports link-state changes to node A only for links (B, A), (B, C), (B,D), and (D, F), and node C reports link-state changes to node A only forlinks (C, A), (C, B), (C, E), and (E, G). Neither nodes B or C wouldreport a link-state change affecting link (F, G) to node A. Thus, unlikethe full-topology embodiment of the TBRPF, in which each node 14 haslink information for every link in the subnet 10, the nodes 18 of thispartial-topology embodiment have link-state information for less thanevery link in the subnet 10.

[0153] If link (B, D) fails, as shown in FIG. 5, node B reports to nodesA and C that link (B, D) has failed (cost=infinity). Node C reports tonode A that link (E, D) 144 has been added to node C's minimum-hop-pathsource tree. After receiving these updates, node A selects node C as itsnew parent for source nodes D and F, and sends a NEW PARENT message tonode C and a CANCEL PARENT message to node B. Node C responds by sendingnode A an update only for link (D, F), because link (D, F) is the onlylink in node C's minimum hop-path source tree with node D or node F asthe head of a link. For example, node F is the head of the link (F, G),but the link (F, G) is not in node C's minimum-hop-path source tree andis therefore not reported to node A. Although the minimum-hop-pathsource tree of node A is modified during the update process, node A doesnot generate any updates because it has no children for any source otherthan itself (i.e., node A).

[0154] TBRPF Protocol Messages

[0155] To disseminate link-state updates to the appropriates nodes inthe subnet 10, neighboring router nodes 14 that have establishedbi-directional links and performed data link to IPv4 address resolutionusing TBRPF neighbor discovery (as described below) exchange TBRPFprotocol messages. The IPv4 addresses are therefore available for use asnode IDs in TBRPF protocol messages.

[0156] In one embodiment, the TBRPF protocol messages are sent via theUser Datagram Protocol (UDP), which requires an official UDP-serviceport-number registration. The use of UDP/IPv4 provides severaladvantages over a data link level approach, including (1) IPv4segmentation/reassembly facilities, (2) UDP checksum facilities, (3)simplified application level access for routing daemons, (4) IPv4multicast addressing for link state messages.

[0157] TBRPF protocol messages are sent to the IPv4 unicast address of acurrent neighbor or to the “All_TBRPF_Neighbors” IPv4 multicast address,presuming that an official IPv4 multicast address is assigned to“All_TBRPF_Neighbors.” In general, a message is sent to the IPv4 unicastaddress of a current neighbor node if all components of the messagepertain only to that neighbor. Similarly, a message is sent to theAll_TBRPF_Neighbors IPv4 multicast address if the message containscomponents which pertain to more than one neighbor neighbors. Nodes 14are prepared to receive TBRPF protocol messages sent to their own IPV4unicast address or the All_TBRPF_Neighbors multicast address.

[0158] Actual addressing strategies depend on the underlying data linklayer, for example, for data links such as IEEE 802.11, a single,multiple access channel is available for all unicast andbroadcast/multicast messages. In such cases, since channel occupancy forunicast and multicast messages is identical, it is advantageous to senda single message to the All_TBRPF_Neighbors multicast address ratherthan multiple unicast messages, even if the message contains componentsthat pertain to only a subset of the current neighbor nodes. In othercases, in which point-to-point receiver directed channels are available,sending multiple unicast messages may reduce contention on the multipleaccess broadcast channel.

[0159] Atomic TBRPF Message Format

[0160]FIG. 6 shows an exemplary embodiment of an individual (atomic)TBRPF protocol message 160 including a message header 162 followed by amessage body 164. Atomic messages may be transmitted either individuallyor as components of a compound TBRPF protocol message having multipleatomic messages within a single UDP/IPv4 datagram. TBRPF message headers162 are either 32-bits or 64-bits in length depending on whether theatomic message is BROADCAST or UNICAST.

[0161] The message header 162 includes a type field 166, a version field168, a mode field 170, a number of sources field 172, an offset field174, a link sequence number field 176, and a receiver identificationfield 178, which is used when the mode is defined as UNICAST.

[0162] The type filed 166 (e.g., 4 bits) represents the atomic messagetype. The following are examples of atomic message types: ACK 1 NACK 2NEW_PARENT 3 CANCEL_PARENT 4 HEARTBEAT 5 END_OF_BROADCAST 6LINK_STATE_UPDATE_A 7 LINK_STATE_UPDATE_B 8 RETRANSMISSION OF BROADCAST9

[0163] The version field 168 (e.g., 3 bits) represents the TBRPFprotocol version and provides a transition mechanism for future versionsof the TBRPF protocol. Also, the version 168 can assist the node 18 inidentifying false messages purporting to be TBRPF protocol messages.

[0164] The mode field 170 (e.g., 1 bit) represents the transmission modefor the atomic TBRPF protocol message 160; the mode is either UNICAST orBROADCAST. UNICAST refers to an atomic message that must be processed byonly a single neighbor node. BROADCAST refers to an atomic message thatis to be processed by all neighbor nodes. (For IPv4 subnets, UNICASTimplies a specific IPv4 unicast address, whereas BROADCAST implies theAll_TBRPF Neighbors IPv4 multicast address.) The following exemplarymode bits are defined: UNICAST 0 BROADCAST 1

[0165] Messages of type ACK, NACK, NEW_PARENT, CANCEL_PARENT,RETRANSMISSION_OF_BROADCAST, and END_OF_BROADCAST are sent as UNICAST.

[0166] Messages of type LINK_STATE_UPDATE_A and LINK_STATE_UPDATE_B maybe sent as either UNICAST or BROADCAST.

[0167] The number of sources field 172 (e.g., 8 bits) represents thenumber of sources “Num_Sources” included in the atomic message 160. Thefield 172 takes a value from 1 to 255 for messages of type: NEW_PARENT,CANCEL_PARENT, LINK_STATE_UPDATE_A, and LINK_STATE_UPDATE_B. All othermessage types are set Num_Sources=0.

[0168] The offset field 174 (e.g., 18 bits) represents the offset (inbytes) from the 0'th byte of the current atomic message header 162 tothe 0'th byte of the next atomic message header 162 in the “compoundmessage” (described below.) An offset of 0 indicates that no furtheratomic messages follow. The 18-bit offset field 174, for example,imposes a 4-kilobyte length restriction on individual atomic messages.

[0169] The sequence number field 176 (e.g., 4 bits) represents the linksequence number (“LSEQ”) for this TBRPF protocol message 160.

[0170] The receiver identification field 178 (e.g., 32 bits) representsthe IPv4 address of the receiving node which is to process this atomicmessage 160. All nodes 18 other than the node identified by theidentification field 178 do not process this atomic message 160. Thisfield 178 is used only if the mode field 170 is set to UNICAST.

[0171] Compound TBRPF Protocol Message Format

[0172]FIG. 7 shows the format for a compound TBRPF protocol message 180,which includes multiple (i.e., “N”) atomic TBRPF messages 160, 160′,160″ that are concatenated to form the compound message 180 within asingle UDP/IPv4 packet. Atomic message headers 162, in one embodiment,are aligned on 32-bit boundaries, therefore an atomic message body 164with a non-integral number of 32-bit words includes 1, 2 or 3 paddingbytes 182, 182′ preceding a subsequent message header 162′, 162″,respectively.

[0173] TBRPF Atomic Message Body Format

[0174] The format of the atomic message body 164 depends on the value inthe type field 166 in the corresponding message header 162. Thefollowing are exemplary formats for an atomic message body 164.

[0175] ACK

[0176] The ACK message carries a NULL message body. A 4-bitacknowledgment sequence number (from 0 . . . 15) is carried in the LSEQfield 176 of the TBRPF message header 162.

[0177] NACK

[0178] Each NACK message is a 16-bit vector. Each bit indicates whethereach of the last 16 messages prior to the 4-bit sequence number suppliedin the LSEQ field 176 of the TBRPF message header 162 was received orlost. As described above, the LSEQ field 176 is set to the sequencenumber of the last broadcast message received from the neighbor node towhich the NACK is being sent.

[0179] NEW PARENT

[0180]FIG. 8A shows an embodiment of an exemplary format 186 for a NEWPARENT message. The format 186 includes one or more source node identityfields 188, 188′, 188″ (generally 188) and one or more correspondingsequence number fields 190, 190′, 190″ (generally 190). Each source nodeidentity field 188 holds a value (e.g., 32 bits) representing the IPv4address of that source node. Each sequence number field 190 holds avalue (e.g., 16 bits) representing a sequence number for thecorresponding source node. The FIG. 8A shows the message format for aneven number of source nodes. FIG. 8B shows an alternative ending 192 forthe NEW PARENT message format 186 for an odd number of source nodes.

[0181] CANCEL PARENT

[0182]FIG. 9 shows an embodiment of an exemplary format 194 for a CANCELPARENT message. The format 194 includes one or more source node identityfields 196, 196′, 196″ (generally 196) for including the IPv4 address ofeach source node for which the CANCEL PARENT message applies.

[0183] HEARTBEAT

[0184] In one embodiment, the HEARTBEAT message has an eight-bit lengthand holds a sequence number for the broadcast channel.

[0185] END OF BROADCAST

[0186] In one embodiment, the END_OF_BROADCAST message has an eight-bitlength and holds a sequence number for the broadcast channel.

[0187] Link-State Update Messages

[0188] The TBRPF protocol provides two formats for two types oflink-state update messages. One type of link-state update message,referred to as type LINK_STATE_UPDATE_A, includes a single sequencenumber for each source node, and is therefore used only if the updatesfor all links coming out of the same source have the same sequencenumber. (For example, periodic updates have this property.) This is doneto reduce the message size. The second type of link-state updatemessage, referred to as type LINK_STATE_UPDATE_B, includes a separatesequence number for each link.

[0189]FIG. 10A shows an embodiment of an exemplary format 198 for onetype of link-state update message, LINK-STATE_UPDATE_A. The format 198includes one or more link-state updates (“lsuA”) 200, 200′, 200″(generally 200). Each lsuA 200 represents an update message with respectto a particular source node and includes a source node identity field202, 202′, 202″ (generally 202), a number of neighbor nodes field 204,204′, 204″ (generally 204), and one or more neighbor node sections 206,206′, 206″ (generally 206). For each neighbor node listed in aparticular lsuA 200, each neighbor node section 206 includes aneighbor-node identity field 208, 208′, 208″ (generally 208), a sequencenumber field 210, 210′, 210″ for corresponding source nodes and neighbornodes, and a link metrics field 212, 212′, 212″ (generally 212) for thatneighbor node.

[0190] The source node identity field 202 holds a value (e.g., 32-bits)for the IPv4 address of the corresponding source node. The number ofneighbor nodes field 204 holds a value (e.g., 16 bits) representing thenumber of neighbor nodes of the corresponding source node. Theneighbor-node identity field 208 holds the IPv4 address of a neighbornode of the corresponding source node. The sequence number field 210holds a value (e.g., 16 bits) representing a sequence number for thecorresponding source and neighbor node. The link metrics field 212 holdsa value (e.g., 32 bits) representing the link metrics associated withthe neighbor node of the corresponding source node.

[0191]FIG. 10B shows an embodiment of an exemplary format 220 for thesecond type of link-state update message, LINK-STATE_UPDATE_B. Theformat 220 includes one or more link-state updates (“lsuB”) 222, 222′,222″ (generally 222). Each lsuB 222 represents an update message withrespect to a particular source node and includes a source node identityfield 224, 224′, 224″ (generally 224), a number of neighbor nodes field226, 226′, 226″ (generally 226), a sequence number field 228, 228′, 228″(generally 228), and one or more neighbor node sections 230, 230′, 230″(generally 230). For each neighbor node listed in a particular lsuB 222,each neighbor node section 230 includes a neighbor-node identity field232, 232′, 232″ (generally 232), and a link metrics field 234, 234′,234″ (generally 234) for that neighbor node.

[0192] The source node identity field 224 holds a value (e.g., 32-bits)for the IPv4 address of the corresponding source node. The number ofneighbor nodes field 226 holds a value (e.g., 16 bits) representing thenumber of neighbor nodes of the corresponding source node. The sequencenumber field 228 holds a value (e.g., 16 bits) representing a sequencenumber for the associated with the source and neighbor nodes. Theneighbor-node identity field 232 holds the IPv4 address of a neighbornode of the source node. The link metrics field 234 holds a value (e.g.,32 bits) representing the link metrics associated with the neighbor nodeof the corresponding source node.

[0193] RETRANSMISSION OF BROADCAST

[0194] In brief, a RETRANSMISSION_OF_BROADCAST message provides theretransmission of a compound update message in response to a NACKmessage. This compound message may contain one or more atomic messagesof type LINK_STATE_UPDATE_A or LINK_STATE_UPDATE_B concatenatedtogether. FIG. 11 shows an embodiment of an exemplary format 240 of aRETRANSMISSION_OF_BROADCAST message including a message header 162′″ anda compound message 180′. The message header 162′″, like the messageheader 162 of the atomic message format 160 described above, includes atype filed 166′, a mode field 170′, a number of sources field 172′, anoffset field 174′, and a link sequence number field 176′. The type field166′ is set to RETRANSMISSION_OF_BROADCAST (e.g., =9), and the number ofsources field 172′ is set to 0. The offset field 174′ is the offset (inbytes) from the 0'th byte of the current compound message header to the0'th byte of the next compound message header 162′ in theRETRANSMISSION_OF_BROADCAST message 240. A 16-bit offset value enablesconcatenation of compound messages 180′ up to 64 kilobytes in length.

[0195] As described above, broadcast update messages can beretransmitted on unicast or broadcast channels. For retransmission on aunicast channel, the mode field 170′ is set to UNICAST (e.g., =0) andthe atomic message header 162′″ precedes the compound message 180′ TheLSEQ field 176′ holds the sequence number corresponding to the unicastchannel on which the message is sent. The LSEQ field 176 of each atomicmessage in the compound message 180′ is the broadcast sequence numberthat was included in the original (broadcast) transmission of themessage. Multiple RETRANSMISSION_OF_BROADCAST messages can be bundledinto a compound message 180′ as described above.

[0196] Selecting a Routing Path for Transmitting Packets

[0197] Routing protocols can also be classified according to whetherthey find optimal (shortest) routes or sub-optimal routes. By notrequiring routes to be optimal, it is possible to reduce the amount ofcontrol traffic (including routing updates) necessary to maintain theroutes. However, optimal routes are desirable because they minimizedelay and the amount of resources (e.g., bandwidth and power) consumed.The TBRPF protocol computes optimal routes based on the advertised linkstates; however, the advertised link states themselves may beapproximate in order to reduce the frequency at which each link isupdated.

[0198] In the full-topology embodiment of the TBRPF protocol, eachrouting node 14 has complete link-state information. Each routing node14 then applies a path selection algorithm to compute preferred paths toall possible destinations, and to update these paths when link statesare updated. One exemplary path selection algorithm is to applyDijkstra's algorithm to compute shortest paths (with respect to cost, c)to all destinations. In other embodiments, the TBRPF protocol can employany other path selection algorithm. Once preferred paths are computed,the routing table entry for node u is set to the next node on thepreferred path to node u. If minimum-hop routing is desired, then therouting table entry for node u can be set to the parent p_i(u).

[0199] 2. Neighbor Discovery

[0200] Each routing node 14 running the TBRPF protocol uses a neighbordiscovery protocol to detect the establishment of new links to newneighbor nodes and the loss of established links to existing neighbornodes. In general, the neighbor discovery protocol dynamicallyestablishes bi-directional links and detects bi-directional linkfailures through the periodic transmission of HELLO messages. Theneighbor discovery protocol is both automatic and continuous, and mayinclude a data link-to-IPv4 address resolution capability. Because theneighbor discovery protocol is responsible for both link statemaintenance and data link-to-IPv4 address resolution in the subnet 10,the neighbor discovery protocol operates as a data-link-level protocol.

[0201]FIG. 12 shows an exemplary embodiment of a process 250 used by thenodes 18 to perform neighbor discovery. The process uses the followingthree types of control messages: HELLO, NEIGHBOR, and NEIGHBOR ACK. Thisembodiment of the neighbor discovery protocol operates as follows. Everyeach node i in the subnet periodically transmits (step 252) a HELLOmessage at predetermined (e.g., HELLO_INTVL=0.5 seconds) timeoutintervals. (The HELLO_INTVL value is common to all nodes 18 within thesubnet 10, but different subnets may use different HELLO_INTVL values.)HELLO messages are sent to the data link level broadcast address andincludes the identity of transmitting node i.

[0202] A node j receiving a HELLO message from a new neighbor, node i,responds (step 254) with a NEIGHBOR message containing the identity ofnode j, sending the NEIGHBOR message to the data link unicast address ofthe new neighbor node i. Then, upon receiving the NEIGHBOR message, nodei sends (step 256) a NEIGHBOR ACK to node j using the data link unicastaddress of node j. The NEIGHBOR ACK message contains the identity ofnode i. The NEIGHBOR and NEIGHBOR ACK messages also contain the currentlink-level sequence number for the broadcast channel (discussed below).Thus, a link from node i to node j is established by node i receiving aNEIGHBOR packet from node j, and a link from node j to node i isestablished by node j receiving a NEIGHBOR ACK packet from node i. Thelink to an existing neighbor is declared to be down if no traffic(including HELLO messages and ACKs) has been received from the neighbornode within a predetermined time interval (e.g., within the lastLINKDOWN_INTVL=2.0 seconds).

[0203] Implementations of this embodiment of the neighbor discoveryprotocol should detect the event of a data link-to-IP address mappingchange for existing links. This may occur in one of the followinginstances:

[0204] 1. Two or more nodes in the subnet 10 are using the same IPaddress.

[0205] 2. An existing node in the subnet 10 has changed its data linklayer address.

[0206] 3. A new node is now using the IP address of a former node thatmay have left the subnet 10.

[0207] In the first case, the implementation should print some form of“duplicate IP address detected” message to the console. In the secondand third instances, the cached link state should be updated to reflectthe new data link-to-IPv4 address mapping.

[0208]FIG. 13 shows an exemplary embodiment of a packet format 260 forthe HELLO, NEIGHBOR, and NEIGHBOR ACK neighbor discovery protocolmessages on the subnet 10. The data link header for each message is notshown, since it is specific to the underlying data link layer.

[0209] The eight-bit “Type” field 262 indicates the type of message. Forexample, each message can be identified by the following examples ofvalues in the Type field 262: HELLO 10 NEIGHBOR 11 NEIGHBOR_ACK 12

[0210] The eight-bit “BCAST Seq# field 264 indicates a sequence numberfrom 0 . . . 15 (4 bits), used in NEIGHBOR and NEIGHBOR ACK messages asdescribed above. The four address fields (sender hardware address 266;sender protocol address 268; target hardware address 270; targetprotocol address 272) facilitate the address resolution process. Thefields 266, 268, 270, and 272 contain the following examples of values,based on the type of neighbor discovery message:

[0211] Message type is HELLO

[0212] Sender Hardware Address 266: data link address of sender

[0213] Sender Protocol Address 268: IPv4 address of sender

[0214] Target Hardware Address 270: data link broadcast address

[0215] Target Protocol Address 272: unused

[0216] Message type is NEIGHBOR

[0217] Sender Hardware Address 266: data link address of sender

[0218] Sender Protocol Address 268: IPv4 address of sender

[0219] Target Hardware Address 270: sender H/W Address from receivedHELLO

[0220] Target Protocol Address 272: sender IP Address from receivedHELLO

[0221] Message type is NEIGHBOR ACK

[0222] Sender Hardware Address 266: data link address of sender

[0223] Sender Protocol Address 268: IP address of sender

[0224] Target Hardware Address 270: sender H/W address from NEIGHBOR

[0225] Target Protocol Address 272: sender IP address from NEIGHBOR

[0226] Usage of the other fields 274, 276, 278, and 280 in the packet260 are described “An Ethernet Address Resolution Protocol: OrConverting Network Protocol Addresses To 48.Bit Ethernet Addresses ForTransmission On Ethernet Hardware,” by David C. Plummer, Request forComments (RFC) No. 826, November 1982.

[0227] Reduced Overhead Hello Protocol

[0228] Another embodiment of the neighbor discovery protocol, hereafterreferred to as Reduced Overhead Hello Protocol (ROHP), is suited forMANETs. As described further below, the ROHP is suited for MANETsbecause the protocol can operate correctly although an asymmetric(unidirectional) link may exist between any two nodes at any time, linkstates may change frequently due to node mobility and interference, andthe channel may be noisy so that not all transmitted packets aresuccessfully received by all neighbor nodes. An objective of ROHP is toallow each node 18 in the MANET 10 to quickly detect the neighbor nodeswith which that node 18 has a direct and symmetric link, (i.e., abi-directional link such that the node at each end of the link can hearthe other node.) The ROHP also detects when a symmetric link to someneighbor no longer exists.

[0229] In brief overview, the ROHP reports each change in the state of aneighbor node (e.g., “heard”, “symmetric”, or “lost”) in only the next KHELLO messages, where K is a small positive integer (e.g., K=3 to 5)such that a node declares a neighbor to be “lost” if it does not receiveany complete HELLO message from that neighbor node within a time periodequal to K number of time intervals. Each time interval is hereafterreferred to as HELLO_INTERVAL, which is for example 0.5 to 2 seconds. Incontrast, each HELLO message of conventional neighbor discoveryprotocols (e.g., as in OSPF and OLSR (Optimized Link-State RoutingProtocol)) includes the identities (or addresses) of all neighbors.

[0230] Neighbor Table

[0231] Each node 18 maintains a neighbor table, which has an entry foreach known neighbor node and stores state information for that neighbornode. An entry for neighbor node B, for example, contains the followingvariables:

[0232] state(B): The current state of the link to neighbor node B, whichcan be “heard”, “symmetric”, or “lost”.

[0233] hold_time(B): The amount of time (in seconds) remaining beforestate(B) must be changed to “lost” if no further complete HELLO messagefrom B is received.

[0234] counter(B): The number of subsequent HELLO messages that includethe identity of the neighbor node B in the list corresponding tostate(B).

[0235] The entry for neighbor node B may be deleted from the table ifstate(B) remains equal to “lost” for a period not less thanK*HELLO_INTERVAL.

[0236] Three possible states of a neighbor node B have the followingmeaning at node A:

[0237] “Heard”: A complete HELLO message was received from neighbor nodeB within the last K*HELLO_INTERVAL seconds, but it is unknown whetherneighbor node B can hear node A.

[0238] “Symmetric”: Nodes A and B can hear each other.

[0239] “Lost”: No complete HELLO message has been received from neighbornode B within the last K*HELLO_INTERVAL seconds.

[0240] Sending HELLO Messages

[0241] Each node 18 sends a HELLO message periodically everyHELLO_INTERVAL seconds, possibly with a small jitter to avoid repeatedcollisions. Because of message size limitations that may be imposed bythe MANET 10, a HELLO message may be too large to send within onepacket, in which case, the sending node 18 sends the HELLO message inmultiple packets within a period equal to the HELLO_INTERVAL. Dependingon the implementation of the ROHP, the receiving node may or may not beable to extract information from a partially received HELLO message.

[0242] A HELLO packet sent by a node includes the following information:

[0243] 1). The identity (e.g., IP address) of the sending node.

[0244] 2). A list of all neighbor nodes that recently changed to the“heard” state. More specifically, a list of identities of neighbor nodesB such that state(B)=“heard” and counter(B)>0.

[0245] 3). A list of all neighbor nodes that recently changed to the“symmetric” state. More specifically, a list of identities of neighbornodes B such that state(B)=“symmetric” and counter(B)>0.

[0246] 4). A list of all neighbor nodes that recently changed to the“lost” state. More specifically, a list of identities of neighbor nodesB such that state(B)=“lost” and counter(B)>0.

[0247] Whenever a neighbor node B is included in one of the above threelists, counter(B) decrements by 1. As a result, each state change isincluded in at most K HELLO messages, and in some cases (as describedbelow) is not included in any HELLO message. HELLO messages can alsocontain other information, as discussed below.

[0248] Receiving a HELLO Message

[0249]FIG. 14 shows an exemplary embodiment of a process by which eachnode 18 operating according to the ROHP neighbor discovery processes areceived HELLO message. In step 288, a node (referred to as receivingnode A) receives a partial or complete HELLO message. Because a HELLOmessage must be transmitted within a time interval of lengthHELLO_INTERVAL, the receiving node A declares the HELLO message to bepartial if not all of its parts have been received within a timeinterval of this length. If the HELLO message is complete and an entryfor neighbor node B does not exist in the table, the receiving node Acreates (step 290) such an entry with state(B)=“lost”. If the HELLOmessage is complete, receiving node A also sets (step 292) the variablehold_time(B) to K*HELLO_INTERVAL. The value of the variable hold_time(B)decreases to 0 (expires) if no HELLO message from neighbor node B issubsequently received within K*HELLO_INTERVAL seconds. When hold_time(B)expires, the receiving node A sets state(B) to “lost” and counter(B) toK. This indicates that the receiving node A is to include the identityof node B is in the list of “lost” neighbor nodes in the transmission ofthe next K HELLO messages or until state(B) changes again (whicheveroccurs first).

[0250] The receiving node A then performs (step 294) an action based onwhether the received HELLO message is complete, whether the receivingnode A appears in a list within the received HELLO message, and if so,which list, and the current state of the neighbor node B (i.e.,state(B)). The actions performed by the receiving node A are summarizedin Table 3 below. TABLE 3 Action Receiving Receiving Receiving Receivingnode A is node A is node A is node A is not in any in “lost” in “heard”in “symmetric” state(B) list list list list lost if msg is if msg is Ifmsg is If msg is complete, complete, complete, complete, set state(B)set state(B) set state(B) set state(B) to “heard” to “heard” to“symmetric” to “symmetric” and counter and counter and counter andcounter (B) to K (B) to K (B) to K (B) to K heard No action No actionSet state(B) to Set state(B) to “symmetric” “symmetric” and counter(B)and counter(B) to K to K symmetric No action Set state(B) If counter(B)= Set counter(B) to “heard” 0, set counter to 0 and counter (B) to K (B)to 0

[0251] Accordingly, upon receiving a complete or partial HELLO messagefrom neighbor node B, the action performed by the receiving node A is asfollows.

[0252] 1. If state(B)=“lost” and the HELLO message is complete, and ifthe message does not include node A in any list or includes node A inthe “lost” list, then set state(B) to “heard” and counter(B) to K. Ifstate(B)=“lost” and the HELLO message is complete, and if the messageincludes node A in the “heard” or “symmetric” list, then set state(B) to“symmetric” and counter(B) to K.

[0253] 2. If state(B)=“heard” and the message includes node A in the“heard” list, then set state(B) to “symmetric” and counter(B) to K. Ifstate(B)=“heard” and the message includes node A in the “symmetric”list, then set state(B) to “symmetric” and counter(B) to 0. (In thiscase, the receiving node A need not include node B in its HELLOmessages, since both nodes A, B already know that the link issymmetric.)

[0254] 3. If state(B)=“symmetric” and the message includes node A in the“heard” list and counter(B)=0, then set counter(B) to K. Ifcounter(B)>0, then counter(B) need not be set to K, because the “heard”entry is just a repeat of one that was included in a recently receivedHELLO message from B. If state(B)=“symmetric” and the message includesnode A in the “symmetric” list, then set counter(B) to 0. (Both nodesknow that the link is symmetric.) If state(B)=“symmetric” and themessage includes receiving node A in the “lost” list, then set state(B)to “heard” and counter(B) to 0. (node B cannot hear node A, but node Acan hear node B.) Note that a complete HELLO message must be received inorder to create a new entry in the neighbor node table or to change thestate of a neighbor node from “lost” to “heard” or to “symmetric.” Thisprevents the creation of a link that has poor quality.

[0255] Variations of ROHP

[0256] In other embodiments, HELLO messages can be augmented to includeenough information to inform each neighbor node of the set of neighbornodes with which the sending node has symmetric links. This can beaccomplished by setting the counter(B) to K, rather than to 0 (see case2 above), so that node B is included in the “symmetric” list of the nextK HELLO messages, even though nodes A and B already know that the linkbetween them is symmetric.

[0257] In addition, node A can inform any new neighbor node of the setof neighbor nodes with which node A has symmetric links. Node A candistribute this information by (a) including the set of all neighbornodes to which symmetric links exist in the next K HELLO messages,whenever the state of the new neighbor node changes to “symmetric”; or(b) sending this information in a separate message that is unicastreliably to the new neighbor node.

[0258] HELLO messages can also be augmented to include otherinformation, such as link metrics, sequence numbers, states ofnon-adjacent links, time stamps, designated routers, special relays, andother data.

[0259] The ROHP can be used in conjunction with any routing protocolthat uses HELLO messages for neighbor discovery, such as TBRPF(described herein), OSPF, and OLSR. An advantage of the ROHP overexisting neighbor discovery protocols, such as those discovery protocolsused within OSPF and OLSR, is that ROHP employs HELLO messages that onaverage are smaller than such neighbor discovery protocols because eachneighbor state change observed by a node is included in at most K HELLOmessages (unlike OSPF and OLSR), resulting in reduced communicationsoverhead and bandwidth consumption. In addition, because HELLO messagesare smaller, they can be sent more frequently, resulting in a fasterdetection of topology changes.

[0260] 3. IPv6-IPv4 Compatibility Address Format

[0261] Referring back to the subnet 10 in FIG. 1, assume, for example,that the nodes 18 of subnet 10 belong to one domain, and that thegateway 16 is the border gateway 16 for that domain. Assume also thatboth the IP host A 12 and the gateway 16 are IPv6 nodes 18, and that theother nodes 18 in the subnet 10 are IPv4 nodes, without any IPv6 routingcapability. Any route taken by packets sent by the IP host A 12 to theserver 40 on the Internet 30 necessarily traverses IPv4 infrastructureto reach the gateway 16. To communicate across the subnet 10 with theheterogeneous IP infrastructure, the IP host 12 and the gateway 16 usean aggregatable, global, unicast addresses, hereafter referred to as an“IPv6-IPv4 compatibility address.”The use of IPv6-IPv4 compatibilityaddresses enables IPv6 nodes (1) to forward IPv6 packets across nativeIPv6 routing infrastructure or (2) to automatically tunnel IPv6 packetsover IPv4 routing infrastructure without requiring a pre-configuredtunnel state. A routing node 14 with an IPv6-IPv4 compatibility addresscan serve as a router for nodes 18 with native IPv6 addresses (i.e.,IPv6 addresses that are not IPv6-IPv4 compatibility addresses) connectedto the same link. On behalf of such native IPv6 nodes, the IPv6-IPv4routing node 14 can automatically tunnel messages across the IPv4infrastructure of the subnet 10 to reach the border gateway 16.

[0262]FIG. 15A shows an exemplary embodiment of a format 300 forIPv6-IPv4 compatibility addresses. The format 300 includes a 64-bitaddress prefix 302 and a 64-bit interface identifier 304. The addressprefix 302 specifies a standard 64-bit IPv6 routing prefix, such as thatdescribed in the Internet RFC (request for comment) #2374. The addressprefix 302 includes a 3-bit Format Prefix (FP) 303, which for allIPv6-IPv4 compatibility addresses is set to “001”, and aggregationidentifiers 305. Consequently, the format prefix 303 and thetopologically correct aggregation identifiers 305 of the IPv6-IPv4compatibility addresses are the same as those of IPv6 addresses assignedto IPv6 nodes, enabling IPv6 nodes to route IPv6 packets using IPv6-IPv4compatibility addresses across IPv6 infrastructure. The 64-bit interfaceidentifier 304 is a specially constructed 64-bit global identifierinterface identifier (i.e., 64-bit EUI-64).

[0263]FIG. 15B shows an embodiment of the interface identifier 304including a 24-bit company identifier 306 concatenated with a 40-bitextension identifier 308. In one embodiment, the 24-bit companyidentifier 306 is a special IEEE Organizationally Unique Identifier(OUI) reserved by the Internet Assigned Numbers Authority (IANA) forsupporting the IPv6-IPv4 compatibility addresses. The IEEE RegistrationAuthority (IEEE/RAC) assigns the OUI to an organization and theorganization owning that OUI typically assigns the 40-bit extensionidentifier 308. In FIG. 15B, the string of ‘c’s represents thecompany-specific bits of the OUI, the bit ‘u’ represents theuniversal/local bit, the bit ‘g’ represents the individual/group bit andthe string of ‘m’s are the extension identifier bits. Here, when the bit‘u’ equals 1, the scope of the address is global and when the bit ‘u’equals 0, the scope is local.

[0264] To support encapsulation of legacy IEEE EUI-48 (24-bit) extensionidentifier values, the first two octets of the 40-bit extensionidentifier 308 (i.e., bits 24 through 39 of the address) are set to0xFFFE if the extension identifier 308 encapsulates an EUI-48 value.Further, the first two octets of the extension identifier 308 are notset to 0xFFFF, as this value is reserved by the IEEE/RAC. All other40-bit extension identifier values are available for assignment by theaddressing authority responsible for a given OUI. Thus, as describedfurther below, the IPv6-IPv4 compatibility address format 300 enablesembedding an IPv4 address in an IPv6-IPv4 compatibility address withoutsacrificing compliance with the EUI-64 bit format.

[0265]FIG. 15C shows an embodiment of the interface identifier 304including an OUI field 306, a type field 310, a type-specific extensionfield (TSE) 312, a type-specific data field (TSD) 314. As shown, the OUIfield 306 includes the OUI of IANA, (e.g., 00-00-5E), with ‘u’ and ‘g’bits. The type field 310 indicates how the TSE 312 and TSD 314 fieldsare interpreted; in general, the type field 310 indicates whether theinterface identifier 304 encapsulates an IPv4 address that is suitablefor automatic intra-subnet IPv6-in-IPv4 tunneling. Table 1 shows theinterpretations of TSE and TSD for various values in the type field 310:TABLE 1 TYPE Value (TSE, TSD) Interpretation 0x00-0xFD RESERVED 0xFE(TSE, TSD) together contain an embedded IPv4 address 0xFF: TSD isinterpreted based on the value of TSE as shown in TABLE 2

[0266] TABLE 2 TSE Value TSD Interpretation 0x00-0xFD RESERVED forfuture use (e.g., by IANA) 0xFE TSD contains 24-bit EUI-48 interfaceidentifier 0xFF RESERVED (e.g., by IEEE/RAC)

[0267] Thus, if an IPv6-IPv4 compatibility address has TYPE=0xFE, theTSE field 312 is treated as an extension of the TSD field 314, whichindicates that the IPv6-IPv4 compatibility address includes a valid IPv6prefix and an embedded IPv4 address.

[0268] If the IPv6-IPv4 compatibility address has TYPE=0xFF, the TSEfield 312 is treated as an extension of the TYPE field 310. WhenTSE=0xFE, the TSD field 314 includes a 240 bit EUI-48 interfaceidentifier. Thus, the IPv6-IPv4 compatibility address format 300conforms to all requirements of a 64-bit global identifier (i.e., theEUI-64 format) and supports encapsulation of EUI-48 interfaceidentifiers (i.e., when TSE=0xFE). For example, an existing IANA EUI-48format multicast address such as:

[0269]01-00-5E-01-02-03

[0270] is written in the IANA EUI-64 format as:

[0271]01-00-5E-FF-FE-01-02-03.

[0272] Other values for TYPE and, hence, other interpretations of theTSE and TSD fields 312, 314 are reserved for future use.

[0273]FIG. 15D shows a specific example of an IPv6-IPv4 compatibilityaddress 316 for a node 18 with an IPv4 address of 140.173.189.8. ThisIPv4 address may be assigned an IPv6 64-bit address prefix 302 of3FFE:1a05:510:200::/64. Accordingly, the IPv6-IPv4 compatibility address316 for this IPv4 node is expressed as:

[0274]3FFE:1a05:510:200:0200:5EFE:8CAD:8108

[0275] In an alternative form, the IPv6-IPv4 compatibility address 316with the embedded IPv4 address is expressed as:

[0276]3FFE:1a05:510:200:0200:5EFE:140.173.189.8

[0277] Here, the least significant octet of the OUI (02-00-5E) in theinterface identifier 304 is 0x02 instead of 0x00 because the bit ‘u’ isset to 1 for global scope.

[0278] Similarly, the IPv6-IPv4 compatibility addresses for thelink-local and site-local (i.e., within the subnet 10) variants,respectively, of are:

[0279] FE80::0200:5EFE:140.173.189.8

[0280] FEC0::200:0200:5EFE:140.173.189.8

[0281] As previously noted, the IPv6-IPv4 compatibility address format300 enables IPv6 nodes to tunnel IPv6 packets through a one-timeIPv6-in-IPv4 tunnel across IPv4 routing infrastructure. FIG. 15E showsan embodiment of a packet header 320 used for tunneling IPv6 packetsusing IPv6-IPv4 compatibility addresses across IPv4 routinginfrastructure. In this embodiment, the header 320 includes a 20-byteIPv4 header 322 and a 40-byte IPv6 header 324. The IPv6 header 324includes an IPv6 address 329 of the node that is the source of the IPv6packet and an IPv6-IPv4 compatibility address 316 associated with thefinal IPv6 destination node. The IPv4 header 322 includes the IPv4address 326 of the dual-stack node that “re-routes” the IPv6 packet bytunneling the IPv6 packet through the IPv4 routing infrastructure. TheIPv4 header 322 also includes the IPv4 address 328 of an IPv4destination node that typically is the same as the IPv4 address embeddedwithin the IPv6 destination address' IPv6-IPv4 compatible interfaceidentifier 304. Alternatively, the IPv4 address 328 can be the IPv4address 328 of the next-hop IPv6 gateway that has a path to the finalIPv6 destination address and, therefore, can forward the IPv6 packettowards the final IPv6 destination node.

[0282] Upon receiving the tunneled IPv6 packet, the IPv4 destinationnode determines that the IPv6 header 324 includes an IPv6-IPv4compatibility address 316 and can route the IPv6 packet to the IPv6destination node identified by that IPv6-IPv4 compatibility address.

[0283] Address Aggregation

[0284] One advantage of embedding an IPv4 address in the interfaceidentifier 304 of an IPv6 address is that large numbers of IPv6-IPv4compatibility addresses 316 can be assigned within a common IPv6 routingprefix 302, thus providing aggregation at the border gateway 16. Forexample, a single 64-bit IPv6 prefix 302 for the subnet 10, such as3FFE:1a05:510:2418::/64, can include millions of nodes 18 with uniqueIPv4 addresses embedded in the interface identifier 304 of the IPv6-IPv4compatibility addresses. This aggregation feature allows a “sparse mode”deployment of IPv6 nodes throughout a large Intranet comprisedpredominantly of IPv4 nodes.

[0285] Globally and Non-Globally Unique IPv4 Addresses

[0286] Another advantage is that IPv6-IPv4 compatibility addresses 316support subnets that use globally unique IPv4 address assignments andsubnets that use non-globally unique IPv4 addresses, such as whenprivate address assignments and/or network address translation (NAT) areused.

[0287] Non-globally Unique IPv4 Addresses

[0288] IPv4 addresses need not be globally unique but may be allocatedthrough a private network-addressing scheme that has meaning only withinthe context of that domain. IPv6-IPv4 compatibility addresses forprivate IPv4 addresses set the ‘u’ bit to 0 for local scope. Forexample, a node with the private, non-globally unique IPv4 address10.0.0.1 can be assigned the IPv6-IPv4 compatibility address of

[0289]3FFE:1a05:510:200:0000:5EFE:10.0.0.1,

[0290] which uses the same example IPv6 64-bit prefix and IANA OUI(00-00-5E) described above with the ‘u’ bit in the EUI-64 interfaceidentifier indicating that this is a local address

[0291] Routing with IPv6-IPv4 Compatibility Addresses

[0292] By embedding an IPv4 address in the interface identifier 304 ofan IPv6-IPv4 compatibility address 316, IPv6 packets can be routedglobally over the IPv6 infrastructure or tunneled locally acrossportions of the IPv4 infrastructure of the subnet 10 that have no IPv6routing support. Thus, the compatibility-addressing scheme supportsheterogeneous IPv6/IPv4 infrastructures in transition with incrementaldeployment of IPv6 nodes within the subnet 10.

[0293] Intra-domain Routing

[0294]FIG. 16 shows an exemplary embodiment of an intra-domain routingprocess 330 by which a routing node 14, configured with IPv6 and IPv4routing tables, routes a packet having the IPv6-IPv4 compatibilityaddress. Upon receiving the packet, the routing node 14 has IPv6 nodesoftware that checks (step 332) for the special IETF OUI 306 and thetype field 310 encapsulated in the interface identifier 304 If thesoftware finds the special OUI 306 and the value of 0×FE in the typefield 310, this means that the received packet has an IPv6 prefix and anembedded IPv4 address.

[0295] The routing node 14 then determines (step 334) if any IPv6routing information leads to the destination node 18; that is, if theIPv6 routing table has an entry for ‘default’ (i.e., the defaultgateway) or for the IPv6 prefix of the destination node 18 If such anentry is found, the router 14 determines (step 336) whether there is apath through IPv6 routing infrastructure to the gateway 16 for the IPv6prefix of the destination node 18 If there is such an IPv6 path, thenthe router 14 sends (step 338) the packet as an IPv6 packet to the IPv6gateway 16 for that IPv6 prefix.

[0296] If no such IPv6 path to the gateway 16 through IPv6 routinginfrastructure exists, the routing node 14 construes (step 340) the lastfour bytes of the extension identifier 308 as an IPv4 address embeddedin the IPv6-IPv4 compatibility address. The routing node 14 thendetermines (step 342) if the IPv4 routing table includes an entry for aprefix of the embedded IPv4 address of the destination.

[0297] Upon finding such an entry, the routing node 14 encapsulates(step 342) the IPv6 packet for tunneling through the IPv4 routinginfrastructure using the embedded IPv4 address as the destination forthe tunneled packet. (The general format for an encapsulated packet isshown in FIG. 15E.) One technique for automatically tunneling the IPv6packet is described in “Transition Mechanism for IPv6 Hosts andRouters,” by R. Gilligan and E. Nordmark, draft-ietf-ngtrans-mech-04.txt(work in progress). This technique can also be applied to the IPv6-IPv4compatibility address. This implies that the gateway 16 also usesIPv6-IPv4 compatibility addresses.

[0298] Inter-domain Routing

[0299] Globally Unique IPv4 Addresses without Privacy Concerns

[0300] Where nodes 18 within an heterogeneous IPv6/IPv4 subnet 10 useglobally unique IPv4 addresses and where no privacy concerns existregarding exposure of internal IPv4 addresses to the public Internet,messages may be routed across domain boundaries using the same routingprocess 330 described above in FIG. 16.

[0301] Globally Unique IPv4 Addresses without Privacy Concerns

[0302] One advantage of the IPv6-IPv4 compatibility address format 300is that the format 300 does not necessarily expose the trueidentification of the sending node, if an administrative authority forthe subnet 10 wishes to enforce a policy of not exposing internal IPv4addresses outside of the subnet 10. To accomplish this, theadministrative authority configures the border gateway 16 of the subnet10 to perform a type of “reverse network address translation,” whichtransforms the IPv6-IPv4 compatibility address interface identifier 304with embedded IPv4 address of the sending node into an anonymous ID forinter-domain routing outside the subnet 10. Within the subnet 10, thefully qualified IPv6-IPv4 compatibility address interface identifier 304with the embedded IPv4 address of the sending node is still used toenable automatic IPv6-in-IPv4 tunneling, and the intra-domain routing ofIPv6 packets follows the process 330 described above.

[0303] In one embodiment, the border gateway 16 advertises an IPv6prefix 302 of 2002::/16 and the IPv6 prefix 302 of 2002:V4ADDR/48where‘V4ADDR’ is the globally unique embedded IPv4 address of the bordergateway 16. IPv6-IPv4 compatibility addresses within the subnet 10 areconstructed as the concatenation of a 2002:V4ADDR/48 prefix, a 16-bitSLA ID, and a 64-bit EUI64 interface identifier 304 as described above.

[0304] For example, if the IPv4 address of the border gateway is140.173.0.1, the IPv4 address of the IPv4 node within the subnet 10 is140.173.129.8 and the node resides within SLA ID 0x001, the IPv6-IPv4compatibility address 316 within the subnet is constructed as:

[0305]2002:8CAD:1:1:0200:5EFE:8CAD:8108,

[0306] where the ‘2002:’ is a predetermined prefix associated with thereverse network address translation; the ‘8CAD:1:’ is the IPv4 address(140.173.0.1) of the border gateway 16; the second ‘1:’ is the SLA ID;the ‘0200:5EFE’ is the IANA-specific OUI (with the ‘u’ bit set to globalscope and the type field 310 indicating that the compatibility addressincludes an embedded IPv4 address; and the ‘8CAD:8108’ is the embeddedIPv4 address (140.173.129.8) of the internal IPv4 node.

[0307] The border gateway 16 performs “reverse network addresstranslation” using an identifier not vulnerable to eavesdropping. Theborder gateway 16 maintains a mapping of the identifier to the actualIPv4 address of the IPv4 node in order to map messages from destinationsback to the actual IPv4 node within the subnet 10. For example, if theborder gateway 16 replaced the IPv4 address 140.173.129.8 with theidentifier value: 0x00000001, the IPv6-IPv4 compatibility addressoutside the subnet 10 is constructed as:

[0308]2002:8CAD:1:1:0000:5EFE:0:1

[0309] Here: again the least significant octet of the EUI-64 interfaceidentifier 304 has the ‘u’ bit set to 0 to indicate that the embeddedIPv4 address is not globally unique.

[0310] The IPv6-in-IPv4 tunneling for inter-domain routing then derivesthe IPv4 source address from the IPv4 address of the numerous separatetunnel transitions for an IPv6 packet traveling from a sending node to adestination node. The transitions include (1) intra-domain tunnels fromthe IPv6 sending node through routers along the path to the bordergateway for its domain, (2) inter-domain tunnels from the sending node'sborder gateway through other transit routers along the path to a bordergateway for the destination, and (3) intra-domain tunnels from thedestination node's border gateway through intra-domain routers along thepath to the destination node itself. Thus, IPv4 addresses within thesubnet are exposed across the public Internet 30.

[0311] Non Globally Unique IPv4 Addresses

[0312] Embodiments of the subnet 10 that use private, non-globallyunique IPv4 addresses require a border gateway 16 that implements aninter-domain routing function as described above. For example, if theIPv4 address of the border gateway 16 is 140.173.0.1, the IPv4 addressof an IPv4 node within the subnet 10 is 10.0.0.1, and the IPv4 noderesides within SLA ID 0×001, the IPv6-IPv4 compatibility address withinthe subnet 10 is constructed as:

[0313]2002:8CAD:1:1:0000:5EFE:0A00:1,

[0314] where again the least significant octet of the EUI-64 interfaceidentifier 304 has the ‘u’ bit set to 0 to indicate that the embeddedIPv4 address ‘0A00:1’ (10.0.0.1) is not globally unique.

[0315] The administrative authority for such embodiments of the subnet10 may institute a policy that permits exposing non-globally unique IPv4addresses to the public Internet 30. In this case, the reverse networkaddress translation is unnecessary, but might be used to protect againsteavesdropping on the non-globally unique addresses.

[0316] Additional Routing Considerations

[0317] In a different embodiment than that described in FIG. 16, eachhost 12 or router 14 that sends an IPv6 packet to an IPv6-IPv4compatibility destination address follows the following process:

[0318] If the 64-bit IPv6 prefix of the IPv6-IPv4 compatibilitydestination address matches the 64-bit IPv6 prefix of one of the networkinterfaces, tunnel the packet through IPv4.

[0319] Otherwise, route the packet through IPv6.

[0320] From the above sending process, a sending node that does not havean interface which shares a common 64-bit routing prefix with thepacket's IPv6-IPv4 compatibility destination address sends the packet tothe next-hop gateway determined by an IPv6 routing table lookup. Inshort, when a sending node does not have an interface which shares acommon 64-bit (site-level) routing prefix with an IPv6-IPv4compatibility destination address, the sending rule is identical to thatfor a native IPv6 destination address. This decision is independent ofwhether the sending node has an IPv6-IPv4 compatibility address itself,or whether the sending node even comprises a dual-stack configuration.The sending node can be a native IPv6 node with no legacy IPv4 support.

[0321] When a sending node has an interface which shares a common 64-bitrouting prefix with an IPv6-IPv4 compatibility destination address, thesending node must assume that the destination is not directly reachableat the data-link level, although the shared site-level routing prefiximplies otherwise. Instead, if the sending node comprises a dual-stackconfiguration, it automatically tunnels the IPv6 packet to the IPv4address embedded within the IPv6-IPv4 compatibility destination address'interface identifier. If the sending node is an IPv6-only node that doesnot comprise a dual-stack configuration, however, it has no means forautomatically tunneling the packet via IPv4. In this case:

[0322] If the sending node is the host that originates the packet, thesending node sends the packet to a router that lists the 64-bit prefixin its router advertisements. If no such router exists, the sending nodeshould drop the packet and return a “No route to host” error indicationto the originating application. If the sending node is a router thatforwards the packet, the sending node drops the packet and sends anICMPv6 “Destination Unreachable” message to the source

[0323] By implication, the scheme breaks down if a packet with anIPv6-IPv4 compatibility destination address reaches an IPv6-only routerthat has an interface that shares a common 64-bit routing prefix withthe IPv6-IPv4 compatibility destination address. Additional mechanismsto address this issue may be possible, such as allowing dual-stackrouters to advertise 96-bit prefixes which incorporate the special32-bit EUI-64 interface identifier prefix: 0200:5EFE. A sending node canthen interpret such an advertisement to mean that the advertising routercomprises a dual stack and is capable of intra-site IPv6-in-IPv4tunneling.

[0324] Incremental IPv6 Deployment Examples

[0325] When deploying an IPv6 node in a subnet that is predominantlyIPv4, the embedded IPv4 address within an IPv6-IPv4 compatibilityassigned to that IPv6 node does not need to be globally unique. Theembedded IPv4 address needs only be topologically correct for and uniquewithin the context of that subnet 10. Also, when deployed in apredominantly IPv4 subnet, the deployed IPv6 node is unlikely to share acommon multiple access data-link with an IPv6 router 14 in the subnet.Because the IPv6 node does not share a common multiple access data-linkwith the IPv6 router, no router advertisements are available. IPv6-IPv4compatibility addresses enable the IPv6 node to join the global IPv6network (i.e., on the Internet 30) by automatically tunneling IPv6messages through the intra-site IPv4 routing infrastructure. For thispurpose, the deployed IPv6 node requires two pieces of staticconfiguration information: the 64-bit IPv6 network prefix for the subnet10 and the IPv4 address of the dual-stack IPv6 gateway 16 servicing thesubnet 10. No other pre-configured tunnel state information is required.

[0326] For example, consider a researcher who wishes to configure IPv6on his existing IPv4-based workstation, but the network administratorsfor the subnet 10 have not yet configured an IPv6 router for theworkstation's LAN. The researcher is aware of a dual-stack IPv6 routerelsewhere within the subnet 10 (which may be several IPv4 router hopsaway from his workstation's LAN) and sets the 64-bit IPv6 address prefixand IPv4 address of the router as configuration information on hisworkstation.

[0327] This configuration information is used to construct two IPv6-IPv4compatibility addresses. One is the concatenation of the IPv6 prefix andthe IPv4 address of the router to construct the IPv6-IPv4 compatibilityaddress for the router. The researcher's workstation uses this IPv6-IPv4compatibility address of the router as its default IPv6 gateway address.The second address is the concatenation of the IPv6 prefix and the IPv4address of the researcher's workstation to construct the IPv6-IPv4compatibility address which the workstation uses as its own IPv6 sourceaddress. The researcher's workstation can now access the global IPv6Internet 30 by first tunneling messages through the subnet-local IPv4routing infrastructure to the IPv6 router. The IPv6 router then routesthe IPv6 messages. No static configuration information is needed on theIPv6 router on behalf of the researcher's workstation.

[0328] As another example, a network administrative authority wishes toconfigure IPv6 on an existing IPv4 subnet under their jurisdiction, butthe subnet is separated from the IPv6 border gateway 16 for the subnetby other IPv4 subnets, which are not ready for IPv6 deployment. Theadministrator configures a dual-stack IPv6 router (or routers) for hisadministrative domain by arranging for SLA (site-levelaggregation)-based subnet allocation(s) from the owner of the IPv6border gateway for the subnet. The administrator further sets the 64-bitIPv6 address prefix and IPv4 address of the border gateway asconfiguration information on his router. The router(s) for theadministrative domain can now access the global IPv6 Internet by firsttunneling messages through the site-local IPv4 routing domain to theIPv6 border gateway for the site. Hosts and/or other IPv6 routers whichshare a common multiple access data-link with the router receive routeradvertisements from which they can construct native IPv6 addresses withtopologically-correct 64-bit prefixes and interface identifiers viaaddress auto-configuration. The IPv6 border gateway for the site needonly have routing information that points to the router(s) for theSLA-based subnet allocations.

[0329] Automatic Deprecation

[0330] As seen in the above deployment examples, the IPv6-IPv4compatibility address format enables incremental IPv6 deployment forhosts and routers within sites that have incomplete or “sparse” IPv6coverage at the network infrastructure level. In general, IPv6-IPv4compatibility addresses are intended for use by nodes 18 that do notreceive router advertisements because such nodes 18 do not share acommon multiple access data-link with an IPv6 router. When routeradvertisements become available, such as when an IPv6 router is deployedon a common multiple access data-link shared by the node 18, the node 18can discontinue use of its IPv6-IPv4 compatibility address and adopt anIPv4 unicast address using address auto-configuration for a prefixdiscovered through router discovery. In this way, IPv6-IPv4compatibility addresses can gradually and automatically disappear asIPv6 nodes become widely deployed within the subnet 10. The followingautomatic deprecation rule for hosts and routers using IPv6-IPv4compatible addresses can be used to transition from the use of IPv6-IPv4compatibility addresses:

[0331] While no IPv6 router advertisements are received, continue to usethe IPv6-IPv4 compatibility address. If router advertisements ensue,discontinue use of the IPv6-IPv4 compatibility address and construct anative IPv6 address based on prefix information carried in the routeradvertisements.

[0332] Address Selection

[0333] To ensure efficient routing within the destination's subnet whenmultiple IPv6 destination addresses alternatives are available, a“second-tier” address selection policy is used for choosing between anIPv6-IPv4 compatibility addresses and a native IPv6 address. If multiplealternatives remain after address selection has been applied on the64-bit routing prefixes, and if at least one of the remainingalternatives is constructed with a native IPv6 interface identifier (onethat does not contain an embedded IPv4 address), select a native IPv6address. Otherwise, select an IPv6-IPv4 compatible address.

[0334] 4. Updating Information Upon Resuming Interrupted Communications

[0335] Referring again to FIG. 1, assume that the mobile node 12,hereafter “client 12”, and the server 40 are communicating over a routeor path through the subnet 10 that includes one or more wireless links.Movement by the client 12 or by another node 14 in the subnet 10 maycause the client 12 to move in and out of communication range of thesubnet 10. For example, the client 12 may move to a new position in thesubnet 10 (as indicated by arrow 27) or to the foreign subnet 20 (asindicated by arrow 29). While moving, the client 12 may break current acommunication link (e.g., link 24) to the subnet 10 and be out of rangeof all routing nodes 14 within the subnet 10. As another example, thenode B may move out of range of the client 12, placing the client 12 outof range of the subnet 10 if the client 12 is not within range ofanother routing node 14 in the subnet 10. Consequently, the client 12 isnot communicating with the server 40 and may not access information,particularly updated information, from the server 40. The inability toobtain updated, timely information may cause resources associated withthe client 12 to be inefficiently used and adversely affect theoperation of the client 12.

[0336] To lessen any adverse impact of client movement, the client 12and the server 40 can (1) use message queues to store communicationsaffected by an interruption for subsequent transmission ifcommunications between the client 12 and the server 40 are resumed; and(2) use bandwidth adaptation techniques to maintain a persistentconnection between the client 12 and the server 40 although a routebetween the client 12 and the server 40 is momentarily lost.

[0337] Message Queues

[0338] The client 12 may register an interest in certain data on server40. In one embodiment, the data are an object. Objects, as will beappreciated by those skilled in the art, are generally programming unitsthat include data and functionality, and are instances of classes. Forexample, the client 12 may be interested in updated informationpertaining to particular objects. In one embodiment, server 40 may alsoinclude meta-objects, which are objects that have no physicalrepresentation, and are classes with methods and attributes that servesas a factory to create new objects. Meta-objects may not beinstantiated, (i.e., meta-objects generally do not provide arepresentation of a physical object). Instead, meta-objects may serve astemplates from which objects that represent physical objects areconstructed.

[0339] The client 12 maintains local copies of objects on the server 40and updates these objects, as necessary, when communicating with theserver 40. Relevant objects associated with the server 40 may bereplicated, (i.e., databases associated with server 40 may bereplicated), on the client 12 to provide the client 12 with the localcopies of objects. Local copies provide the client 12 with access torelatively up-to-date information should the client 12 move out of thecommunications range of the subnet 10, interrupting communications withthe server 40. While the link 24 is broken, the local copies of theobjects, which are active entities, can continue to run on the client12. Then when the client 12 moves back into the communications range ofsubnet 10 or the subnet 20 and reestablishes communications with theserver 40 over the same or a different route, the server 40 can providethe client 12 with updated, or current, information. That is, the server40 may update the local copies of the objects that are present on theclient 12 in, for example, a local cache.

[0340] In general, because communications between the server 40 and theclient 12 are synchronous, the server 40 is aware of all objects thatare associated with the client 12 Server 40 may then be able to savestate information associated with the client 12 Therefore, server 40 mayrestore the current state of the client 12 as appropriate (e.g., whenlost link 24 is re-established). It should be appreciated, however, thatserver 40 is generally not aware of any semantics with regards toobjects. Rather, the server 40 is only aware that objects have beenupdated, and, further, that the corresponding updates should beforwarded to the client 12 as appropriate.

[0341] Server 40 includes an object list that is a list of all objectsassociated with the server 40 and which are to be updated. In otherwords, the object list is a queue of object updates. The client 12 maycommunicate substantially with the server 40 after the client 12 isregistered with respect to server 40. That is, client 12 may sendcommands to server 40. In one embodiment, such commands include lists oftopics in which the client 12 is interested. The server 40 may sendupdate messages to the client 12 to indicate that certain objects on theclient 12 should be updated such that the states of the objects on theclient 12 are consistent with the states of the corresponding objects onthe server 40. The client 12 also includes an object list, (i.e., aclient object list), that contains substantially all objects that areassociated with the client 12 In general, the new client object listcontains all objects, which are associated with the server 40 and whichthe client is “interested” in.

[0342] The client 12 communicates with the server 40 over a route (orpath) through the subnet 10 or subnet 20 determined by the routing nodes14. The client 12 may transmit data to the server 40 directly or througha message queue. The client 12 queues data on the message queue when,for example, data has been modified and is to be sent to the server 40.Specifically, when the client 12 creates or modifies data, the data issent to the server 40 through the message queue. The communicationsbetween the client 12 and the message queue may, in the describedembodiment, be performed using a potentially unreliable communicationslink (e.g., wireless link), while the communications between the messagequeue and server 40 are typically more reliable, (e.g., wired link).

[0343] Data is placed on the message queue by the client 12, and isremoved from the message queue by the server 40 or, more specifically,communications software associated with server 40. Data is removed fromthe message queue after the data has been successfully received by theserver 40.

[0344] When the client 12 creates data (e.g., objects), the client 12typically associates that data with a unique identifier that is used bythe client 12 and the server 40 to identify that data. One example of aunique identifier is a timestamp. The associated timestamp is updatedeach time the data are updated or modified by the client 12. A timestampessentially prevents data conflicts from arising when more than oneclient attempts to modify that data at a given time. Specifically,timestamps are monotonically increasing such that substantially no dataconflicts between unique identifiers can arise. Other embodiments usedifferent mechanisms to uniquely identify data, such as an authorizationlevel that is associated with the users of a particular client; apriority level that is associated with the particular type of data, andthe order in which the data are received (e.g., LIFO, FIFO).

[0345] Similarly, the server 40 can communicate directly to the client12 or through a message queue(s), or lists, for storing objects in whichthe client 12 has indicated an interest. The message queue(s) can bepart of or separate from the server 40. Hence, data may be transmittedto the client 12 from the server 40 through such message queues. Inother words, the server 40 may use substantially the same heuristics asthe client 12 for sending data. Data is placed on the message queue bythe server 40 and removed from the message queue by the client 12.Again, data is removed from the message queue when the client 12 hassuccessfully received previously removed data.

[0346] 5. Adaptive Use of Network Bandwidth

[0347] In general, within the subnet 10, which includes wireless links,a variety of different failures can interrupt or cause a communicationsoutage. For example, a failure may be due to a hardware problem ateither the client 12 or the server 40. A failure may also be the resultof a software problem, (e.g., data may be successfully received butacknowledgement of the receipt may fail). Failures may also occurbecause of problems with links, as mentioned previously. Such failuresmay include a failure of any link on a route between the client 12 andthe server 40. It should be appreciated that in some cases, more thenone failure may occur at any given time.

[0348] To adaptively handle interruptions in communications between theclient 12 and the server 40, the internetworking system 2 may rundiagnostics to determine the cause of the interruption. After the causeis determined, the system 2 makes corrections that restorecommunications. Depending upon current system parameters, such adaptivecorrections include, but are not limited to attempting (1) toreestablish the same interrupted connection between the client 12 andthe server 40, (2) to establish a connection between the client 12 to aredundant server, or (3) to establish a new connection between theclient 12 with the server 40. Other techniques used alone or incombination with the aforementioned corrections include varying and/orincreasing the waiting period between unsuccessful attempts to establisha connection the client 12 and the server 40 and adjusting the length oftransmitted packets. Such techniques can be used in response to currentbandwidth conditions in the subnet 10.

[0349] When a “network dropout” occurs in the subnet 10, (e.g., when theclient 12 or the server 40 appears to be out of communication with thesubnet 10), standard client-server systems, such as those based uponTCP/IP, typically operate under the assumption that the failure is dueto network congestion. As will be understood by those skilled in theart, although a network dropout in a low-bandwidth, wireless subnet mayindeed occur as a result of network congestion, the network dropout mayalso occur for a variety of other reasons including, but not limited to,packet loss due to coverage problems.

[0350] Packet loss associated with a network typically involves eitherthe failure of transmission of a packet of data or the loss of some ofthe data transmitted in a packet. Although packet losses can occur forany number of reasons, packet losses often occur when the client 12 isat least temporarily out of range of the subnet 10 or when acommunication link in the route between the client 12 and the server 40is temporarily interrupted.

[0351] By counting the number of packets sent and as the total number ofpackets acknowledged, the packet loss in a system may be determined.Measuring packet loss enables the manner in which packets are resent orrebroadcast to be dynamically changed such that the resending of packetsis substantially optimized with respect to the network.

[0352] The internetworking system 2 can use the measure of packet lossto determine the length of packets that are transmitted between theclient and the server 40. For example, if packets with lengths of 1000bytes experience a 15% packet loss, and packets with lengths of 100bytes experience a 1% packet loss, then the client 12 and server 40 cantune the length of transmitted packets to minimize the percentage ofpackets that fail to reach their destination. A factor in determiningthe packet length is the tradeoff between data throughput and thepercentage of packet loss. That is, the smaller the packet length, thegreater the percentage of packets that reach their destination, but thelower the percentage of payload (i.e., data) transmitted in each packetbecause each packet also carries a number of overhead bits.

[0353] Also, the client 12 and the server 40 can dynamically adjust thepacket length based upon packet loss measurements that are takenperiodically. The client 12 and/or the server 40 can make such packetlength adjustments. Further, the client 12 can use a packet length thatdiffers from the packet length used by the server 40; packetstransmitted from the client 12 to the server 40 may take a differentroute with different bandwidth capabilities than packets transmittedfrom the server 40 to the client 12

[0354] When a network dropout occurs due to network congestion, repeatedattempts may be made to reconnect a “dropped out” client 12 or server 40to the subnet 10. If a network dropout occurs due to packet loss andattempts are made to reconnect the client 12 or the server 40 to thesubnet 10, the overall performance of the subnet 10 may degrade to apoint where the overall performance of the subnet 10 is unacceptable.That is, attempting to initiate a connection that in fact has not beenlost may preclude other connections from being made, thereby preventingthe transmission of data which would be made across those connections.

[0355] Although a variety of different methods may be used to actuallydetermine if a network dropout is the result of network congestion or ofpacket loss, such a determination may be made using ongoing statisticalmeasurements. Alternatively, the speed at which data is transmitted maybe changed. Typically, when a network dropout is due to packet loss,changing the speed of data transmission often solves the networkdropout. However, when network dropout is due to network congestion,changing the speed of data transmission may have no effect and mayworsen the throughput.

[0356] In order to enable communications to be optimized to reflectactual network conditions, the client-server system may measure theroundtrip time for packet transmission. That is, the amount of time thatelapses while a packet of data is transmitted from the client 12 to theserver 40, or vice versa, may be measured. Although the measurements maybe used for substantially any purpose, the measurements are often usedto characterize the quality of a connection or route between the client12 and the server 40. By way of example, for certain networks theduration of a roundtrip can indicate whether a connection is good; shortroundtrips are associated with good connections, while long roundtripsare associated with poor connections. The measurements of roundtriptimes for a variety of different packets may further be used tostatistically determine how long to wait between attempts to resend anunsuccessfully sent packet.

[0357]FIG. 17 shows an embodiment of a process used by the client 12 andthe server 40 establish and maintain a persistent connection in adynamically changing network environment using the above-describedbandwidth adaptation techniques. Although the process is describedgenerally from the perspective of the client 12 sending messages to theserver 40, the process also applies to when the server 40 sends messagesto the client 12 The process begins (step 350) by establishingcommunications between the client 12 and the server 40. Attempts toestablish a connection with the server 40 can begin when the client 12comes within range of the subnet 10. The client 12 sends a packet andawaits a reply from the server 40. The client 12 then waits a specifiedperiod of time. If that period elapses without a receiving a response,the client 12 attempts again to establish a connection with the server40 by sending another packet. Again, the client 12 waits a specifiedperiod of time, but the current waiting period is longer than theprevious waiting period. By waiting for a longer period (i.e., “backingoff”) on the subsequent connection attempt, the client 12 isaccommodating the dynamic and intermittent quality of mobile wirelessnetworks by giving any response from the server 40 additional time toarrive at the client 12 If the new waiting period also times out, theclient 12 sends the packet to the server 40 yet again and waits a stilllonger period for the reply from the server 40 that establishes theconnection.

[0358] Under some circumstances, numerous clients 12 (e.g., 200), mayarrive within range of the subnet 10 simultaneously, each attempting toestablish a connection with the server 40. For example, consider amilitary “invasion” scenario in which each participant is equipped witha portable computer capable of establishing a wireless connection to thesubnet 10 and thus of communicating with the server 40. These computersare used to coordinate the military invasion and to assist inpinpointing the position of each individual during the operation. Anonslaught of connection attempts could overwhelm the subnet 10 and theserver 40 with packets such that only a portion of the computers areable to successfully establish a connection with the server 40. If eachcomputer then backed off for approximately the same period of timebefore attempting again to connect to the server 40, the outcome mightbe the same; namely, another onslaught of connection packets thatimpedes some of the computers from establishing a connection. Thus, inone embodiment, the computers are configured so that the back-off periodis not the same for each of computers, causing the attempts to connectto the server 40 to be staggered. That is, some computers 12 wait forlonger periods than other computers before sending another connectionpacket to the server 40.

[0359] After communications are established over a route through thesubnet 10 that includes one or more wireless links, the client 12identifies (step 354) a packet of data that is to be sent to the server40 as having been sent. After identifying the packet as having beensent, the client 12 transmits (step 358) the packet through the subnet10 over a route determined by the routing nodes 14 The packet may bequeued in a message queue and sent to the server 40 based onprioritization within the message queue. Similarly, in some embodiments,if a packet is being sent from the server 40 to the client 12, thepacket may also be added to a message queue and sent to the client 12 asdetermined by priorities assigned within the message queue.

[0360] When the packet that is sent includes data that is to be updated,the data may be sent in a variety of different forms. That is, within anobject based system, when an object is modified, either the entireobject may be sent in a packet, or substantially only the changes to theobject may be sent in a packet. By way of example, when an object has asize that is smaller than a predetermined threshold, the entire objectis sent in a packet. Alternatively, when the object is larger than thatthreshold, the updates or changes to that object alone may be sent in apacket, although the entire object may also be sent.

[0361] The client 12 then determines (step 362) whether it has receivedan acknowledgement from the server 40 indicating that the server 40received the packet. The client 12 may make the determination after apredetermined amount of time has elapsed. Receipt of the acknowledgmentindicates that the packet has been successfully transmitted andreceived. Hence, the client 12 identifies (step 366) the packet as beingsuccessfully sent and received, and the process of sending data iscompleted.

[0362] If the client 12 instead determines that no acknowledgement ofthe packet has been received, then this is an indication that there mayhave been a failure in the network that prevented the server 40 fromreceiving the packet. Such failures may include, but are not limited to,failures such as a failure of the client 12, of the server 40, and acommunication link in a route between the client 12 and the server 40.The failures may also be due to packet loss, and not to a physicalfailure of any component of the overall system.

[0363] When the packet has not been successfully received, then adetermination is made (step 370) as to whether a maximum number ofresend tries has been exceeded. The maximum number of attempts to send apacket between the client 12 and the server 40 may generally be widelyvaried, and is typically determined using statistical models based uponthe measured behavior of the overall system. The maximum number ofresend tries may be updated at any suitable time during the operation ofthe overall system. By way of example, the maximum number of resendtries may be calculated and, if necessary, revised, whenever theaccumulation of statistical information reaches a certain level.

[0364] When the client 12 determines that the maximum number of resendtries has not been exceeded, another attempt is made (step 358) to sendthe packet. As mentioned above, the amount of time to wait betweenresend tries may be based upon statistical calculations based uponinformation that includes the average roundtrip time for a transmittedpacket. An attempt to resend a packet can be successful when the initialfailure in sending the packet was the result of packet loss.

[0365] On the other hand, if it is determined that the maximum number ofresend tries has been exceeded, then attempts to send the packet overthe potentially existing link are aborted. When the maximum number ofresend tries has been exceeded, and acknowledgement of packet receiptstill has not been received, then it is likely that there has been aninterruption of the communications link, and that the unsuccessfullysending of data was likely not due to packet losses. Accordingly, anattempt is made (step 374) to reestablish communications between theclient 12 and the server 40.

[0366] A determination is then made (step 378) as to whethercommunications between the client 12 and the server 40 have beensuccessfully reestablished. If the determination is that communicationshave been successfully reestablished, the packet is sent (step 358).Alternatively, when it is determined that communications between theclient 12 and the server 40 have not been reestablished, then anotherattempt is made (step 374) to reestablish communications. The number ofattempts to reestablish communications between the client 12 and theserver 40 may be limited in some cases. In one embodiment, attempts tore-establish communications may be aborted after a predetermined numberof attempts have been reached. In other embodiments, when the number ofattempts to re-establish communications is limited, the number ofattempts that are made may be substantially dynamically determined basedon statistical information gathered during the course of communicationsbetween client 12 and the server 40. In still another embodiment, afterthe number of attempts to establish a connection with the server 40 isreached, the attempts to establish a connection with the client 12 cancontinue with a different server upon which the data are replicated.

[0367] By limiting the number of times attempts made to send data and,further, by not first attempting to re-establish communications whichmay not actually have been interrupted, the amount of availablecommunications bandwidth in a system may be substantially optimized. Thebandwidth may be allocated to making actual connections which arerequired, rather than wasting the bandwidth by immediately attempting tore-establish communications when such re-establishment is not necessary.

[0368] In one embodiment, when an attempt is made to send data from theclient 12 to the server 40, the data is queued on a message queue suchthat the data is prioritized for transmission to the server. Generally,a single message queue may be shared between multiple servers.

[0369]FIGS. 18A and 18B are a diagrammatic representation of theupdating of an embodiment of a message queue 380 in accordance with anembodiment of the invention. As mentioned above, when data are createdor modified, a timestamp accompanying the data is set or modified,respectively. A message queue 380 is shown at time t3 as includingobjects that were previously modified and have not yet been accepted by,(i.e., sent to and successfully received by), the server 40. At time t3,at the head 382 of the message queue 380 is an object “obj 1” that wasmodified at time t1, followed by an object “obj 6” that was modified attime t2, and an object “obj 9” that was modified at time t3. In thedescribed embodiment, the queue 380 is prioritized in afirst-in-first-out (FIFO) manner, although priority can instead be basedon a variety of other factors.

[0370] At time t5, the queue 380 further includes an object “obj 3” thatwas modified at time t4. Also at time t5, object “obj 6” is beingmodified such that its corresponding timestamp is updated accordingly.Further, at time t6, an object “obj 4” is modified. In one embodiment,the object “obj 6” that was modified at time t2 is superceded by aversion of object “obj 6” that is updated at time t5. That is, theobject “obj 6” at timestamp t2 has been replaced with the object “obj 6”at timestamp t5. As shown in FIG. 18B, at time t6, the message queue 380no longer includes the object “obj 6” that was modified at time t2 and,instead, includes object “obj 6” that was modified at time t5. Withinthe queue 380, object “obj 6” in one embodiment does not take thepriority of object “obj6” at time t2, which has been removed. Instead,object “obj 6” takes a chronological position within the message queue380, after object “obj 3” and before after the modification at time t5of object “obj 6” . A variety of techniques for prioritizing objectswithin the message queue 380 is described in co-pending patentapplication entitled “Method and Apparatus for Updating Information in aLow-Bandwidth Client/Server Object-Oriented System”, Ser. No.09/518,753, which is incorporated by reference herein in its entiretyfor all purposes.

[0371] While the invention has been shown and described with referenceto specific preferred embodiments, it should be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention asdefined by the following claims. For example, although the describedembodiments illustrate the principles of the invention with respect towireless networks, such also apply to wire-line networks.

What is claimed is:
 1. In a multi-hop network including a plurality ofnodes, a method for disseminating topology and link-state informationover the multi-hop network, comprising: maintaining a path tree for eachsource node in the network that can produce an update message, each pathtree having that source node as a root node, a parent node, and zero ormore children nodes; receiving an update message from the parent node inthe path tree maintained for the source node that originated thereceived update message, the update message including informationrelated to a link in the network; and determining whether to forward theupdate message to children nodes, if any, in the path tree maintainedfor the source node that originated the update message in response tothe information in the received update message.
 2. The method of claim 1wherein the information related to the link indicates whether the updatemessage is to be forwarded to other nodes.
 3. The method of claim 1wherein the path tree associated with each source node is aminimum-hop-path tree.
 4. The method of claim 1 further comprisingobtaining link-state information from one or more nodes in the path treemaintained for a given source node for use in developing the path treeto that source node.
 5. The method of claim 1 wherein the link is awireless communication link.
 6. The method of claim 1 further comprisingsending a new parent message to a node selecting that node as a newparent node for the source node originating the update message.
 7. Themethod of claim 6 further comprising receiving from the new parent nodein response to the new parent message link-state information associatedwith the source node that originated the update message.
 8. The methodof claim 7 wherein the new parent message included a serial number andthe link-state information received in response to the new parentmessage is associated with update messages having serial numbers thatare greater than the serial number included in the new parent message.9. The method of claim 1 further comprising: determining that a paththrough a new parent node for the source node originating the updatemessage has the same number of node hops as the path through the currentparent node, and maintaining the current parent node as the parent nodefor the given source node.
 10. The method of claim 1 further comprising:determining that a path to the source node originating the updatemessage ceases to exist; and maintaining the current parent node as theparent node for the source node.
 11. The method of claim 1 furthercomprising broadcasting the update message to the children nodes if thenumber of children nodes exceeds a predefined threshold when forwardingthe update message to children nodes.
 12. The method of claim 1 furthercomprising transmitting the update message to each child node using aunicast mode if the number of children nodes is less than a predefinedthreshold when forwarding the update message to children nodes.
 13. Themethod of claim 1 further comprising: computing a parent node for eachneighbor node and source node; and determining which neighbor nodes arechildren nodes for a given source node;
 14. A network, comprising: aplurality of nodes in communication with each other over communicationlinks, each node maintaining a path tree for each source node in thenetwork that can produce an update message, each path tree having thatsource node as a root node, a parent node, and zero or more childrennodes, wherein one of the nodes (i) receives an update message from theparent node in the path tree maintained for the source node thatoriginated the received update message, the update message includinginformation related to a link in the network, (ii) and determineswhether to forward the update message to children nodes, if any, in thepath tree maintained for the source node that originated the updatemessage in response to the information in the received update message.